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ABSTRACT 


Reliability  is  an  important  concern  of  any  computer  system.  No  matter  how 
carefully  designed  and  constructed,  computer  systems  fail.  The  rapid  and  systematic 
restoration  of  service  after  an  error  or  malfunction  is  always  a major  design  and 
operational  goal.  In  order  to  overcome  the  effects  of  a failure,  recovery  must  be 
performed  to  go  from  the  failed  state  to  an  operational  state.  This  thesis  describes 
a recovery  method  which  guarantees  that  a computer  system,  its  associated  data 
bases  and  communication  transactions  will  be  restored  to  an  operational  and 
consistent  state  within  a given  time  and  cost  bound  after  the  occurrence  of  a system 
failure. 

This  thesis  considers  the  optimization  of  a specific  software  strategy  - the  rollback 
and  recovery  strategy,  within  the  framework  of  a graph  model  of  program  flow 
which  encompasses  communication  interfaces  and  data  base  transactions. 
Algorithms  are  developed  which  optimize  the  placement  of  dynamic  recovery 
checkpoints.  Presented  is  a method  for  statically  pre-computing  a set  of  optimal 
decision  parameters  for  the  associated  program  model,  and  a run-time  technique  for 
dynamically  determining  the  optimal  placement  of  program  recovery  checkpoints. 
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Chapter  1 
INTRODUCTION 


1.1  Reliability  and  Recoverability 

The  concept  of  reliable  computing  has  existed  for  a long  time,  but  it  has  remained 
almost  exclusively  the  preserve  of  the  hardware  designer.  Hardware  structures  have 
been  developed  which  can  continue  to  provide  the  required  facilities  despite 
occasional  failures,  either  transient  or  permanent  outages  of  internal  components  and 
modules  [1,2]. 

Since  the  term  reliable  system  can  have  many  different  meanings,  it  is  important  to 
clearly  establish  just  what  is  desired.  One  does  not  try  to  build  a completely 
non-failing  device.  Instead,  one  introduces  redundancy  into  systems  of  intrinsically 
unreliable  components.  This  redundancy  may  be  in  hardware  - additional  hardware 
modules,  or  it  may  be  in  time  - additional  software,  or  a combination  of  both. 

Given  this  redundancy,  the  attempt  is  to  build  a system  which  will  recover 
automatically  within  a given  time  period.  Recovery  is  defined  as  the  continuation  of 
system  functions,  after  the  incidence  of  an  error,  with  data  integrity.  In  a total 
system  environment,  it  is  a problem  requiring  both  hardware  and  software  aids.  The 
fault  must  be  diagnosed  and  if  a broken  hardware  module  is  at  fault,  then  it  needs 
to  be  removed  from  the  system.  For  both  transient  and  solid  failures,  data  must  be 
reconstructed  to  a consistent  state  before  restarting. 

in  many  applications  it  is  not  necessary  to  operate  continuously  and  perfectly.  The 
needed  reliability  of  a computer  system  is  a function  of  the  task  which  is  being 
performed.  A computer  failure  while  running  a numerical  analysis  program  is 
annoying  at  worst.  However,  in  such  areas  as  real-time  process  control,  spacecraft 
guidance,  and  air  traffic  control  a computer  failure  can  be  catastrophic.  When 
human  lives  are  at  stake  it  is  imperative  that  systems  perform  reliably.  For  these 
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situations  it  is  sufficient  to  operate  correctly  most  of  the  time,  so  long  as  outages  are 
infrequent,  fixed  with  minimal  human  intervention,  and  most  importantly  that  the 
system  recover  within  a maximum  time  limit. 
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How  one  copes  with  infrequent  brief  outages  depends  on  what  one  is  trying  to  do. 
For  tasks  which  are  tightly  coupled  to  real  time  requirements,  such  as  a real  time 
process  control  application,  a method  is  to  choose  checkpoints  at  which  to  record  the 
state  of  the  system,  so  that  one  can  always  recover  by  restarting  from  the  checkpoint 
just  preceeding  an  outage  [3].  Other  applications  with  tighter  real-time  constraints 
may  only  tolerate  outages  of  several  seconds,  or  milliseconds  before  the  system 
suffers  catastrophic,  unrecoverable  failure.  Thus  an  aircraft  guidance  system  might  at 
times,  tolerate  only  the  briefest  downtime,  whereas  an  airliiie  reservation  system 
could  adapt  to  downtimes  of  a considerably  longer  duration,  before  the  network's 
general  operation  would  be  jeopardized. 
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Occasionally,  despite  all  efforts,  a system  will  break  so  catastrophically  that  it  will  be 
unable  to  recover.  Given  that  there  is  sufficient  redundancy  in  a system,  a goal  is  to 
reduce  the  probability  of  such  a system  failure  to  the  probability  of  failure  of  all 
redundant  components.  The  presence  of  operable  system  components,  however,  is  not 
sufficient  to  guarantee  that  operation  will  be  resumed.  In  addition,  the  software 
must  be  able  to  survive  the  transients  accompanying  the  failure,  re-configure,  adapt 
to  the  remaining  hardware,  restore  jII  faulty  data  to  a consistent  state,  and  continue 
processing. 

In  certain  applications  one  is  also  concerned  with  maintaining  privacy  and  security 
along  with  reliability.  Security  is  concerned  with  protecting  a system  from  an  active 
external  agent  who  seeks  to  defeat  system  objectives.  Reliability  means  the  ability  of 
a system  to  overcome  or  recover  from  random  errors.  Security  and  reliability  are 
related,  and  often  itie  techniques  for  providing  reliability  will  interfere  with  the 
maintenance  of  a secure  system.  These  interdependencies  will  be  elaborated  later 
when  the  system  model  is  described. 
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The  major  approach  to  computer  reliability  proposes  a redundant  system  design  and 
studies  the  interaction  of  the  various  redundancy  techniques.  The  redundancy  may 
exist  in  the  form  of  extra  modules,  such  as  triple  modulo  redundancy  [TMR] 
techniques  [4, 5,6,7],  redundant  software,  such  as  Randell's  method  of  acceptance  tests 
with  alternate  recovery  blocks  [8],  or  the  use  of  extra  time  to  perform  the  function 
of  maintaining  system  integrity. 

The  technology  of  reliable  computing  encompasses  theory  and  techniques  of  fault 
detection  and  correction,  modelling,  analysis,  synthesis,  and  the  architecture  of 
fault-tolerant  systems  and  their  evaluation  [9].  Reliability  and  recoverability  cannot 
be  added  on  to  a computer  system.  An  iterative  process  must  be  used  in  design. 

The  final  implementation  of  the  recovery  process  will  be  the  result  of  evaluation  of 
the  best  possible  data  integrity  assurance  at  the  minimum  cost  in  both  hardware  and 
software. 

In  a typical  recoverable  computer  system,  there  are  four  major  tasks  to  perform.  The 
first  is  fault  or  error  detection.  The  second  is  the  identification  of  the  fault,  in 
hardware,  a data  transaction  error,  or  possibly  identifying  the  fault  as  transient  and 
non-recurring.  The  third  is  the  modification  of  the  system  to  eliminate  the  cause  of 
the  fault,  and  fourth  is  the  system  restart  after  reconfiguration. 

Hardware  checking  and  diagnostics  can  be  used  for  assumed  failure  modes,  but  they 
must  be  supplemented.  In  addition,  program  errors  must  be  discovered.  This  can  be 
done  by  audit  programs  interleaved  with  operational  programs.  If  software 
information  is  to  be  audited,  it  must  be  redundant  and  be  able  to  satisfy  consistency 
relations. 

One  use  of  audit  programs  is  to  check  hardware  by  its  proper  execution  of 
operational  programs.  A second  use  is  to  check  for  data  integrity,  for  example,  the 
consistency  of  data  base  information.  A third  is  to  check  the  validity  of  the 
information  necessary  for  the  supervisory  program:  the  queues,  tasks,  system 
directories,  buffers  and  other  system  resources  allocated  by  software. 
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Recovery  programs  are  the  software  equivalent  of  hardware  retry.  Similarly,  audit 
algorithms  are  analogous  to  hardware  checking.  The  better  these  algorithms  are.  the 
greater  the  information  integrity  and  the  more  valid  is  the  recovery  information. 

Also,  without  accurate  diagnosis  of  the  cause  of  the  fault,  system  reconfiguration  will 
be  inaccurate  and  much  potential  fault  tolerance  will  be  wasted.  Audit  routines  are 
extremely  useful,  but  are  environment  dependent.  They  perform  hardware  checking  of 
the  components  and  data  transaction  consistency.  They  are  principally  responsible 
for  information  integrity  of  the  system.  Recovery  routines  are  activated  by  the  audit 
routines.  They  reconfigure  the  system,  and  then  restore  it  to  a consistent  state  via 
checkpoint  and  rollback  techniques. 


1 . 2 Hardware  Reliability 

Reliability  enhancement  is  achieved  through  better  components  and  by  adding 
functional  redundancy  to  the  hardware  modules.  There  are  two  types  of  functional 
redundancy;  fault  masking  redundancy  and  standby  redundancy.  Masking 
redundancy  is  achieved  by  implementing  a function  or  module  so  that  it  is  inherently 
error  correcting,  for  example  TMR  techniques.  With  standby  redundancy,  spare 
modules  are  switched  into  the  system  when  working  modules  break  down.  The 
process  of  applying  tests  and  determining  whether  the  computer  is  fault  free  or  not 
is  generally  known  as  fault  detection  or  checkout.  Fault  location  and  isolation  is  the 
process  of  identifying  the  failures  within  the  smallest  possible  set  of  components. 

In  order  to  avoid  complete  systems  failures,  a failed  component  must  be  repaired  or 
replaced  before  its  backup  also  breaks.  1 he  system  must  therefore  report  all 
failures.  It  must  be  possible  to  remove  and  replace  any  component  while  the  system 
continues  to  run.  The  system  should  absorb  repaired  or  newly  introduced  modules 
gracefully. 
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1.2.1  Reliable  Hardware  Systems  - TMR 

Triple  Modulo  Redundancy  [4,7]  was  one  of  the  earliest  methods  suggested  for 
obtaining  a reliable  system  from  less  reliable  components.  The  system  output  of  Fig. 

1.1  is  the  majority  of  three  identical  components. 


If  only  one  of  the  components  is  in  error,  the  system  output  will  not  be  in  error, 
since  the  majority  of  components  will  not  be  in  error.  Thus,  the  system  can  tolerate 
errors  in  any  one  component.  These  errors  may  be  transient  or  permanent. 


1.3  Hybrid  Reliability 

Software  recovery  after  fault  detection  with  hardware  self-repair  is  a hybrid 
utilization  of  reliability  techniques.  Various  strategies  are  used  to  reduce  the  impact 
of  interruptions  or  malfunctions  both  to  the  system  and  to  the  user.  Operating 
System  360  as  used  in  Model  65  [3]  is  equipped  with  a set  of  programs  called  the 
recovery  management  support  which  embodies  a number  of  methods.  The  recovery 
methods  depend  on  the  nature  of  the  malfunction.  In  the  input/output  area, 
rereading  of  input  data  with  parity  errors  is  common.  If  errors  persist  even  after 
repeated  retries  the  system  could  consider  reconstruction  of  damaged  data  (error 
correction)  if  possible.  In  the  case  of  processor  error,  the  instruction  may  be  retried 
if  feasible  (if  the  operands  were  not  modified  by  the  instruction).  The  most 
important  technique  OS  360/65  provides  is  checkpoints  in  alt  programs  so  that 
programs  can  be  rolled  back  to  a previous  state  and  computation  resumed. 


1.3.1  Hybrid  Systems  - JPL  .STAR 

The  JPL  STAR  (Self  Testing  And  Repair)  computer  system,  as  seen  in  Fig.  1.2 
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Figure  1.2  - JPL  STAR  HYBRID  COMPUTER  BLOCK  DIAGRAM 


obtains  reliability  by  using  TMR  and  spares  [4], 

The  n spares  are  inactive  and  not  powered  on.  If  at  any  point  in  processing,  one  of 
the  three  active  modules  disagrees  with  the  majority,  the  disagreeing  module  in  the 
minority  is  switched  out  and  replaced  by  a spare.  The  spare  must  be  powered  up 
and  loaded.  One  method  of  loading  is  to  use  rollback  and  load  the  component  with 
the  last  saved  error  free  state,  and  resume  computation  from  that  point  [10].  If  at 
most,  one  component  (module)  fails  during  a rollback  cycle,  and  if  the  vote  taker  is 
error  free,  the  system  is  fail  safe  until  all  the  spares  are  used  up. 


1.3.2  Hybrid  Systems  - PLURIBUS 

BBN  has  constructed  a multiprocessor  computer  system,  Fig.  1.3,  which  achieves  both 
increased  operating  power  and  gains  increased  system  reliability  through  parallelism 
and  redundancy  [1,11].  Their  system  architecture  is  known  as  Pluribus.  The  system 
consists  of  processor  units,  memory  units  and  input/output  units.  Each  unit  is  in 
itself  a communication  buss  providing  a physical  housing,  power  and  cooling,  and  a 
communication  discipline  provided  by  a buss  arbiter.  All  processor  busses  are 
coupled  to  all  memory  busses  and  all  input/output  busses,  likewise  all  memory  busses 
are  coupled  to  all  input/output  busses. 


The  Pluribus  system  provides  extra  copies  of  every  vital  hardware  resource,  and 
isolation  between  copies  so  that  any  single  component  failure  will  impair  only  one 
copy,  leaving  a potentially  runnable  machine.  It  also  provides  software  facilities 
necessary  to  survive  transients  stemming  from  failure  and  to  adapt  to  running  on  the 
new  hardware  configuration. 


•i 


1.4  Software  Reliability 

Methods  have  been  developed  to  improve  reliability  primarily  by  means  of  software 


6 


J 


I 


i 

I 

I 


6a 


[8,12,13].  The  cost  associated  with  software  methods  is  generally  the  additional  time 
and  storage  required  for  processing. 

In  many  real  time  systems  it  is  necessary  to  recover  rapidly  from  an  error.  One  way 
of  achieving  quick  recovery  is  to  fix  the  cause  of  the  error  (assuming  that  it  was  not 
transient),  and  then  rollback  and  restart  the  program  at  a previously  saved  error  free 
state.  If  an  error  is  detected  while  a program  is  being  processed  and  if  the  error 
cannot  be  corrected  immediately,  it  may  be  necessary  to  run  the  entire  program 
again.  The  time  lost  in  running  the  program  may  be  substantial  and  in  some  real 
time  applications,  critical.  Software  methods  for  enhancing  reliability  assume  that 
the  systems  programs  are  written  correctly.  Software  techniques  utilizing  incorrect 
programs  will  not  improve  system  reliability. 

Since  software  is  an  expensive  item,  and  software  errors  have  become  very  costly,  a 
need  arises  to  validate  and  verify  the  correct  operation  of  a software  package  before 
it  is  committed  to  regular  use.  Several  approaches  to  the  formal  validation  of 
programs  have  been  studied  [16,17,18].  These  methods  either  put  considerable 
burden  of  the  validation  on  the  programmer  or  require  that  the  input/output 
assertions  provided  in  the  program  be  verified  by  a sophisticated  theorem  proving 
mechanism.  An  abstract  model  of  computations  in  a program  and  a method  of 
proving  that  a specific  program  will  always  run  properly  is  provided  by  King  [19]. 
The  complexity  of  most  of  the  program  verifying  techniques  indicates  the  need  for 
much  simpler  methods  which  could  provide  partial  validation  of  large  programs. 


1.4.1  Reliable  Software  Systems  - Randell 

Randell  [8]  has  developed  a method  for  structuring  programs  by  the  use  of  recovery 
blocks.  This  is  illustrated  in  Tig.  1.4.  His  aim  is  to  provide  the  dependable  error 
detection  and  recovery  facilities  which  can  cope  with  errors  caused  by  software 
design  inadequacies,  particularly  in  the  system  software,  rather  than  the 
malfunctioning  of  hardware  components. 
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Figure  1.4  = RANDELL'S  RECOVERY  BLOCK  SCHEME 
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Randell's  scheme  is  software  analogous  to  hardware  standby  sparing.  As  the  system 
operates,  tests  are  made  on  the  acceptability  of  the  results  generated  by  each  software 
module.  Should  one  of  these  checks  fail,  a spare  software  module  is  switched  in  to 
take  the  place  of  the  erroneous  module.  The  spare  software  module  is  not  a copy  of 
the  main  software  component,  but  one  utilizing  an  alternate  and  independent  design, 
so  that  it,  hopefully,  can  cope  with  the  circumstances  which  caused  the  main 
component  to  fail.  The  technique  uses  recovery  blocks,  in  which  acceptance  tests  of 
task  functions  are  ensured  by  primary  or  else  by  a number  of  secondary  alternatives. 

His  recovery  block  scheme  incorporates  a solution  to  the  problem  of  switching  to  the 
use  of  the  spare  component  and  of  repairing  damage  done  to  non-local  data  via  a 
recursive  cache  structure.  If  the  backed  up  program  has  modified  global  variables, 
these  variables  must  be  restored  to  their  previous  values,  or  the  program  could 
operate  on  incorrect  data  when  the  it  is  restarted.  If  the  variables  that  a program 
has  modified  were  used  by  another  program,  then  the  program  that  used  the 
modified  variables  must  also  be  backed  up. 

Randell's  model,  though,  does  limit  concurrent  processing,  all  data  transactions  with  a 
data  base,  and  communications  with  other  external  processes. 


1.4.2  Software  Reliability  - Russell's  Extensions 

Russell  [12]  has  extended  Randell's  work  on  recovery  blocks  and  recursive  cache  by 
presenting  a system  design  that  supports  the  restoration  of  system  state  in  a system 
of  asynchronous  communicating  parallel  processes.  He  provides  send/receive 
primitives  which  implement  interprocess  communications  through  messagelists, 
illustrated  in  Fig.  1.5.  Also  provided  are  recovery  primitives  which  perform  the 
consistent  state  restoration  of  the  system. 


The  complexity  of  state  restoration  is  analyzed,  and  is  shown  to  be  dependent  upon 
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Figure  1.5  - RUSSELL'S  PROCESS  CACHE  AND  MESSAGELIST  SCHEME 


the  structure  of  the  messagelist  and  on  its  interconnection  with  the  processes.  More 
efficient  recovery  is  possible  if  the  system  is  constrained  to  insert  slackmarks  into 
the  process  cache  before  executing  the  send/receive  operations.  Russell  finds  bounds 
for  the  amount  of  state  restoration  which  must  be  performed  to  restore  the  system  to 
a previously  consistent  recovery  point  after  the  occurrence  of  an  error. 
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1.4.3  Software  Reliability  - Chandy's  Work 

Chandy  [13,14]  has  presented  a model  of  computation  for  process  control  systems. 
Each  job  run  by  the  system  is  partitioned  into  several  tasks,  Pig.  1.6,  by  the 
programmer.  The  programmer  must  construct  a task  graph  which  represents  the 
program  control  flow.  Associated  with  each  vertex  of  the  graph  is  the  maximum 
execution  time  for  the  corresponding  task. 

His  graph  models  the  synchronous  program  execution  in  which  multiple  edges  leading 
away  from  a node  represent  possible  control  flow  points  of  which  only  one  may  be 
taken  at  run  time.  Chandy's  model  arrives  at  a means  of  optimally  inserting 
checkpoints  along  edges  of  program  flow.  His  model  includes  neither  concurrent 
processing  nor  transactions  with  an  external  data  base  or  communications  with 
external  processes.  It  also  requires  a priori  knowledge  of  node  execution  times. 


1.5  Hardware  - Sciftware  Tradeoffs 

Consider  an  aerospace  system  such  as  an  air-traffic  control  system.  T he  system  has  a 
specific  goal  which  must  be  accomplished  within  a certain  specified  amount  of  time. 
A large  penalty  Is  incurreil  il  the  system  does  not  accomplish  its  mission.  A lateness 
penalty  is  incurred  if  the  time  taken  to  .iccomplish  the  goal  exceeds  this  limit.  The 
longer  the  time  taken  the  larger  the  penalty. 
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Several  models  have  been  constructed  for  designing  reliable  machines  from 
intrinsically  less  reliable  components  by  using  redundancy.  Ihese  are  hardware 
methods.  The  cost  of  a hardware  method  is  the  cost  required  to  build  and  maintain 
the  redundant  hardware.  The  cost  associated  with  software  methods  of  achieving 
reliability  is  generally  the  additional  time  and  space  required  for  processing,  and 
possibly  the  actual  manpower  cost  associated  with  providing  this  software. 

In  some  systems,  software  methods  have  to  be  ruled  out  since  the  amount  of  time 
available  to  complete  a task  is  too  short  to  permit  methods  which  require  additional 
time.  In  other  cases,  the  longer  the  system  takes,  the  more  expensive  are  the 
consequences. 


Studies  have  been  done  by  Ramamoorthy,  Chandy  and  Cowan  [15]  which  attempt  to 
construct  a framework  in  which  hardware  and  software  methods  can  be  compared  for 
cost  effectiveness.  In  essence,  their  method  compares  the  costs  of  delays  introduced 
by  time  redundancy  techniques  with  the  costs  of  hardware  in  hardware  redundancy 
schemes.  They  analyze  TMR,  hybrid,  and  TMR  with  standby  spares  (self-purging), 
obtaining  techniques  for  computing  a set  of  indices  for  comparison  of  reliability 
methods.  However,  the  problem  of  selecting  the  optimal  mix  of  redundancy 
strategies  for  a system  is  very  difficult  because  of  the  numerous  cost-effectiveness 
parameters  which  can  be  adjusted. 

1.6  Summary 

A computer  system  for  error  recovery  must  provide  four  capabilities; 

1.  a means  for  detecting  that  an  error  occurred, 

2.  a means  for  locating  and  diagnosing  the  error, 

3.  a means  for  correcting  any  adverse  affects  the  error  has  caused,  and 
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4.  a means  for  reconfiguring  the  system  so  that  the  error  does  not  reoccur. 
Retrying  a failed  operation  will  succeed  if  the  error  was  a transient  error. 


In  the  remainder  of  this  thesis,  detection,  location,  diagnosis,  and  reconfiguration  are 
not  considered.  The  optimal  restoration  of  the  correct  system  state  is  studied. 

The  algorithms  and  programs  used  for  the  recovery  scheme  are  assumed  to  be  error 
free  and  to  function  properly. 


1.7  Organization  of  Thesis 

In  Chapter  2,  "A  Model  of  Program  Structure  for  Rollback  and  Recovery,”  the 
particular  characteristics  of  a highly  available  computer  system  are  presented,  along 
with  a model  of  program  behavior  which  includes  communication  interfaces  and  data 
base  transactions.  The  concept  of  the  statically  pre-computed  decision  parameter  set 
which  minimizes  the  expected  program  execution  time  is  introduced,  and  an 
algorithm  for  its  use  at  run-time  is  presented. 

In  Chapter  3,  "Algorithm  Which  Optimizes  the  Insertion  of  Recovery  Checkpoints," 
the  MERT  algorithm  is  presented  and  proven  to  minimize  the  total  expected  run 
time  of  the  program. 

Chapter  4,  "Examples  Illustrating  the  Dynamic  Insertion  of  Recovery  Checkpoints," 
presents  a (typical)  program  graph  which  is  analyzed  by  the  MERT  algorithm.  This 
analysis  determines  the  optimal  decision  parameter  set,  which  is  then  used  to 
illustrate  the  run-time  program  behavior  for  several  different  runs  of  this  same 
program. 
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Chapter  2 

A MODEL  OF  PROGRAM  STRUCTURE  FOR  ROLLBACK 
AND  RECOVERY 


Program  checkpointing  and  rollback  is  a method  of  enhancing  computer  system 
reliability.  Program  checkpointing  is  the  process  of  making  a copy  of  program  state 
in  secondary  storage.  Rollback  is  the  re-loading  of  this  state  upon  the  occurrence  of 
an  error,  and  the  restart  of  the  system. 

The  objective  and  constraints  may  vary  considerably  from  system  to  system.  The 
system  being  considered  is  assumed  to  have  high  availability.  However,  if  an  error 
does  occur,  it  must  recover  very  rapidly  since  a delay  in  performing  system  functions 
may  have  catastrophic  results.  The  objective  is  to  assure  that  every  recovery  be 
rapid.  For  our  system,  an  explicit  constraint  is  assumed:  the  interval  of  time  taken 
to  recover  must  not  exceed  a given  quantity,  M time  units.  In  actual  practice  M will 
depend  on  the  system,  may  vary  from  job  to  job  in  a given  system,  and  may  depend 
upon  the  actual  stage  in  processing  for  a particular  job.  Our  system  is  assumed  to 
have  sufficient  processing  power  to  perform  its  primary  task  and  to  support  the 
overhead  which  is  associated  with  rollback  and  recovery.  The  objective  of  our 
analysis  is  to  minimize  this  associated  rollback  and  recovery  overhead,  with  the 
constraint  that  recovery  never  exceed  IM  time  units. 


2. 1 Optimal  Placement  of  Rollback  Points 

The  checkpointing  strategy  may  be  static  or  dynamic.  Static  checkpointing  requires 
carrying  out  checkpointing  at  fixed  intervals  regardless  of  their  immediate  necessity. 
In  a dynamic  checkpointing  environment,  the  placement  of  checkpoints  will  vary 
from  one  run  of  a program  to  the  next.  This  variation  will  depend  upon  the 
dynamic  runtime  characteristics  of  the  program.  Dynamic  checkpointing  yields 
higher  system  availability  than  static  checkpointing  hecause  it  takes  into  account  the 
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actual  rollback  and  recovery  requirements. 
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The  optimal  placement  of  recovery  checkpoints  necessitates  that  the  programmer 
analyze  the  program  flow  and  make  estimates  on  certain  branching  parameters,  task 
execution  times,  etc.  Ihe  more  often  a program  is  run,  the  more  cost-effective  and 
beneficial  is  the  optimal  placement  of  recovery  checkpoints.  Programs  whose  total 
processing  time  is  shorter  than  a maximal  crucial  recovery  time  do  not  need  recovery 
checkpoints.  A program  which  is  worth  analyzing  for  the  optimal  placement  of 
rollback  points  will  have  a combination  of  these  attributes: 


• It  must  be  crucial  to  the  application  of  the  program  that  error  recovery  be 
accomplished  quickly  and  systematically. 

• It  is  necessary  to  maintain  correct  operation,  imperative  that  errors  be 
detected  and  corrected. 

* The  same  program  must  be  run  a number  of  times,  a 
production  program. 

* The  program  will  require  a substantial  amount  of  processing  time. 


There  are  many  application  areas  which  possess  these  attributes: 


* Real  time  process  control  of  expensive  or  dangerous  components. 

* Applicalit)ns  where  human  lives  are  endangered  by  extended  system 
downtime,  such  as  air  traffic  control, 

* Applications  in  which  many  people  are  dependent  upon  the  continuous 
.service  of  ;in  essential  or  expensive  commodity,  such  as  electronic  funds 
transfer  (hl'f)  in  banking  or  an  airline  reservation  system. 

* .Spacecraft  guidance,  navigation,  .ind  life  support  systems. 
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There  are  several  different  parameters  to  consider  when  deciding  the  optimal 
placement  of  recovery  checkpoints.  The  choice  and  placement  to  insert  a checkpoint 
depends  upon  the  importance  of  speedy  error  recovery.  In  certain  real  time 
applications  it  is  crucial  that  a program  reliably  run  to  completion  bounded  by  a 
fixed  maximal  time  limit.  In  other  applications,  the  loss  incurred  due  to  a system 
failure  is  only  the  computer  time  wasted. 


2.2  Program  Graph  Model 
2.2.1  Program  Graph 

Our  algorithm  for  the  optimal  placement  of  recovery  checkpoints  uses  a sequential 
graph  model  to  describe  a program.  Similar  graph  models  have  been  used  for  the 
analysis  of  program  structure  and  behavior  [20,21,22,23].  We  require  that  a 
programmer  analyze  his  program  and  represent  it  as  a sequence  of  tasks.  This 
analysis  could  be  done  manually  from  a flow  chart,  or  could  be  accomplished 
automatically  with  the  aid  of  an  analysis  program.  A task  will  consist  of  a number 
of  machine  instructions,  and  will  involve  an  amount  of  processing  time  which  is 
bounded  by  the  maximal  recovery  time  M. 

Let  a program  be  represented  by  a directed  graph,  as  in  Fig.  2.1,  where  each  node  i 
in  the  graph  corresponds  to  a task  i in  the  program,  and  edge  (ij)  exists  if  task  j 
may  directly  succeed  task  i with  probability  pjj. 


2.2.2  Program  Running  Time 

The  analysis  makes  use  of  estimates  made  by  the  programmer  on  the  expected 
amount  of  processing  time,  tj,  required  by  a task  i.  It  is  impossible  to  design  an 
algorithm,  which,  given  any  program,  determines  the  time  that  may  he  required  to 
process  each  task  in  the  program.  However,  it  is  possible  for  a progiammer  to 
obtain  estimates  of  average  or  worst  case  bounds  for  the  tasks  of  his  particular 
program,  these  times  could  be  obtained  through  a measurement  system.  In  many 
computer  installations,  programmers  submit  estimates  of  the  maximum  time  reqiiirerl 
to  process  their  jobs.  It  is  important  to  note  that  in  installations  where  a 
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Figure  2.1  - GRAPH  MODEL  OF  PROGRAM  BEHAVIOR 


programmer  is  allowed  to  specify  recovery  checkpoints,  he  must  make  estimates  of 
this  sort,  and  then  make  intuitive  decisions  based  on  these  estimates.  Our  objective 
is  to  clarify  and  formalize  this  decision  making  process.  The  accuracy  of  the 
decisions  clearly  depends  on  the  accuracy  of  the  estimates.  The  decision  to  insert 
recovery  checkpoints  depends  on  the  importance  of  speedy  error  recovery:  the 
penalty  incurred  if  a program  does  not  run  to  completion  in  a prescribed  amount  of 
time. 

Obtaining  a program  graph  from  a program  is  not  inexpensive.  The  program  must 
be  analyzed  and  estimates  made  of  several  parameters.  In  many  cases,  the  advantage 
gained  in  having  tailor  made  recovery  checkpoints  is  not  worth  the  time  spent  to 
obtain  a program  graph.  In  these  cases  static  checkpoints  at  fixed  intervals  are 
sufficient.  However,  in  those  cases  where  the  costs  of  slow  error  recovery  are  high, 
the  advantage  of  dynamic  recovery  checkpoints  outweighs  the  time  spent  to  construct 
the  program  graph.  We  are  concerned  with  cases  (Section  2.1)  of  this  latter  type. 


Program  State 


At  any  stage  in  the  processing  of  a program,  certain  information  is  required  by  tbe 
program  for  computation  to  proceed  successfully.  A slate  at  any  stage  in  the 
processing  of  a program  will  be  defined  as  the  information  (program  variables,  state 
of  the  input/output  devices,  secondary  storage)  which  may  be  subsequently  used  by 
the  program. 


At  each  edge  (i,j),  one  may  dynamically  choose  to  insert  a recovery  checkpoint.  If  a 
checkpoint  is  inserted  on  edge  (i,j),  then  after  task  i is  completed  and  before  task  j is 
begun,  the  state  of  the  system  is  saved  in  secondary  storage.  Any  state  saved  prior  to 
this  (i,j)  rollback  point  is  accumulated,  allowing  subsequent  recovery  attempts  to 
make  use  of  multiple  recovery  checkpoints  (Section  2.4).  Thus,  when  an  error  occurs, 
an  attempt  is  made  to  restart  the  program  from  the  most  recently  saved  state. 


Branching  Probabil 


Associated  with  each  program  branch  is  the  probability,  pjj,  that  branch  (i.j)  will  be 
followed.  For  each  node  i,  it  follows  that 

H Pij  = 1 
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Furlhermore,  the  probabilities  pjj,  are  assumed  to  be  fixed  and  independent  of  the 
way  the  progiam  reached  the  particular  branch  (ij).  This  will  be  recognized  as  a 
Markov  model  assumption.  Although  this  is  not  always  valid,  it  is  a simplifying 
assumption  to  aid  the  analysis.  If  we  do  not  assume  a Markov  model,  the  resulting 
analysis  is  overly  complex  [23], 


2.2.5  Acyclic  Program  Graph 

The  sequencing  of  tasks  may  change  from  one  run  to  the  next  due  to  the  conditional 
branching  probabilities  pjj.  In  the  graph  model,  it  is  assumed  that  no  task  is 
repeated.  The  program  graph  (J  is  an  acyclic  directed  graph.  If  there  is  a loop  in 
the  program,  then  each  iteration  may  be  modeled  as  a distinct  task,  or  the  iterations 
may  be  combined  into  a sequence  of  one  or  more  tasks.  Beizer  [23]  presents  a 
method  for  transforming  a cyclic  program  graph  into  one  which  is  acyclic. 


2.2.6  Error  Latency  and  Detection 

An  error  which  is  not  detected  as  soon  as  it  occurs  may  propagate.  For  example,  if 
an  erroneous  input  is  used  to  update  a data  element,  the  updated  item  will  also  be  in 
error.  When  an  error  is  detected,  it  is  not  generally  possible  to  ascertain  when  the 
error  occurred,  nor  the  amount  of  error  propogation.  The  error  latency  is  the  period 
of  time  between  the  occurrence  of  an  error  and  its  detection.  The  distribution  of 
error  latency  depends  upon  the  method  used  for  error  detection.  If  error  detection 
occurs  intermittently  at  a fixed  interval  I',  then  the  error  latency  is  not  likely  to 
exceed  1'.  The  error  latency  distribution  influences  the  amount  of  error 
propagation.  If  an  error  has  a short  lifetime,  it  is  less  likely  to  be  used  to  update 
other  data  items,  and  therefore  less  likely  to  propagate. 

Error  detection  may  be  performed  continuously  or  intermittently.  Parity  checking  is 
one  example  of  continuous  error  detection  since  it  can  proceed  as  long  as  the  system 
is  available.  Other  techniques  may  perform  integrity /consistency  checking  only  at 
discrete  intervals.  Ihe  ability  to  lumlizv  the  extent  ol  error  has  ,i  beneficial  impact 
on  the  recovery  process. 

Suppose  an  error  occurs  while  ;i  task  is  being  processed,  and  suppose  it  is  not 
diagnosed  (Fig  2.5).  If  a checkpoint  occurs  immediately  after  the  task  is  complete. 
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then  the  information  that  rs  saved  at  the  checkpoint  will  be  erroneous.  Subsequently, 
if  an  error  is  diagnosed,  this  erroneous  information  will  be  loaded,  and  the  system 
will  continue  processing  from  the  recovery  checkpoint.  Eventually  the  same  error 
will  be  diagnosed  again.  If  the  same  error  is  detected  after  rolling  back,  we  can 
conclude  that  an  undiagnosed  error  occurred  before  the  last  recovery  checkpoint,  or 
that  there  exists  a permanent  malfunction  which  the  system  reconfiguration  has  not 
corrected.  If  an  undiagnosed  error  has  occurred,  we  have  the  potential  of  rolling 
back  again,  to  a previously  saved  recovery  checkpoint,  in  an  attempt  to  reload  a 
correct  copy  of  the  program  state. 


2.3  Data  Set  Interactions 

The  total  state  of  the  system  which  must  be  saved  at  rollback  recovery  checkpoints 
consists  of  both  program  state  (memory,  registers,  etc.)  and  the  complete  state  of  all 
peripheral  devices  which  interact  and  exchange  information  with  the  program.  Let  us 
refer  to  all  interacting  external  devices  as  data  sets. 


2.3.1  Data  Sets 

Data  sets  include  all  input  and  output  devices,  such  as  user  terminals,  measuring 
transducers,  system  data  bases,  etc.  If  a system  failure  occurs,  and  operation  returns 
to  a previously  established  rollback  recovery  point,  it  must  be  possible  to  insure  that 
all  data  sets  are  returned  to  consistency  with  the  program  state  at  this  rollback  point. 

Rolling  back  data  sets  implies  that: 


1.  input  devices  must  be  able  to  furnish  again  the  identical  sequence  of  data. 

2.  output  results,  some  of  which  may  be  erroneous,  must  be  revoked, 

3.  additions,  deletions  and  updates  to  data  bases  must  be  undone,  and 

4.  dialogue  with  users  at  remote  terminals  must  be  duplicable. 
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2.3.2  Transaction  Journals 


In  order  to  provide  for  this  backup  capability,  the  system  must  continuously 
maintain  a record  of  pertinent  transactions  which  occur  with  each  data  set.  In  its 
simplest  form,  the  transaction  journal  is  a record  of  input  and  output  transactions 
performed  by  the  program.  Each  item  is  written  onto  secondary  storage  before  it  is 
processed.  For  an  update  action,  a typical  journal  entry  might  consist  of  the  item 
name  which  is  being  updated,  its  old  value,  and  its  new  value.  The  content  and  use 
of  these  journals  is  dependent  upon  the  characteristics  of  the  individual  data  set. 

Three  types  of  transaction  journal  need  to  be  generated  by  the  system  so  that  data  set 
rollback  may  take  place: 


2.3.2. 1 Backup  Journals 

The  backup  journal  is  used  to  restore  the  data  set  to  the  earlier  state  which  existed 
at  the  last  rollback  point.  Backup  journals  provide  a record  of  those  input  and 
output  transactions  between  the  program  and  data  set  which  modify  the  state  of  the 
data  set.  The  backup  journal  need  only  record  those  transactions  which  result  in  an 
update  being  made  to  the  data  set.  Individual  entries  in  the  backup  journal  include 
that  information  necessary  to  undo  state  modifications,  such  as  additions  and 
deletions  to  a data  base. 


2. 3. 2. 2 Revoke  Journals 

The  revoke  journal  provides  the  ability  to  either  revoke  any  erroneous  output  which 
was  issued  by  the  real  data  set  since  the  last  recovery  checkpoint,  or  to  indicate  the 
extent  of  this  erroneous  data  so  that  an  (external)  agent  might  take  appropriate 
actions  to  recall  or  undo  the  effects  of  this  output  data.  Entries  in  the  revoke 
journal  would  indicate  that  output  which  might  be  potentially  erroneous  due  to  the 
occurrence  of  a system  error. 
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2.3.2.3 


Journals 


The  replay  journal  provides  the  ability  to  simulate  the  replay  of  a non-repeatable 
input  data  set,  such  as  a reader,  measuring  transducer,  etc.  Entries  into  the  replay 
journal  would  include  all  those  input  data  transactions  which  the  data  set  could  not 
again  furnish  to  the  program. 


2.3.3  Virtual  Data  Sets 

^ To  facilitate  rollback  and  restart,  it  is  desirable  to  treat  all  classes  of  data  sets  in  a 

similar  and  general  manner.  For  our  model  this  is  accomplished  by  providing  an 
interface  process  between  the  real  (physical)  data  set  and  the  program.  This 
interface  process  creates  a virtual  data  set  as  seen  by  the  program.  A virtual  data 
set  is  one  which  possesses  the  same  attributes  as  the  real  data  set.  but  in  addition,  is 
capable  of  being  backed  up  to  a previously  defined  state  upon  the  detection  of  a 
system  error.  This  capability  is  independent  of  the  actual  physical  attributes  and 
limitations  of  the  real  data  set.  This  is  shown  in  Fig.  2.2. 


2. 3. 3.1  Virtual  Data  Set  Interface  Processes 

The  virtual  data  set  capability  is  provided  by  the  data  set  interface  process,  whose 
structure  and  actions  arc  tailored  to  the  characteristics  of  the  real  data  set.  The 
interface  process  and  its  attendant  transaction  journal  supplement  the  real  data  set, 
creating  those  capabilities  necessary  for  the  correct  execution  simulating  a virtual 
data  set.  Both  the  interface  process  and  the  transaction  journal  are  tailored 
specifically  to  create  a virtual  data  set  interface  for  the  associated  real  data  set. 

Depending  upon  the  characteristics  of  the  real  data  set,  the  interface  process  makes 
use  of  the  transaction  journal  to  provide  this  virtual  capability. 


2.3.32  R ecovery  Com  man  ds  to  t he  Intc  rface  P rocess 

In  order  to  ensure  data  set  consistency  during  the  rollback  and  recovery  procedure. 
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VIRTUAL  DATA  SET 


Figure  2.2  - Virtual  Data  Set  Interface  Process 


all  virtual  data  sets  must  recognize  and  act  upon  control  commands  from  the 
program.  These  commands  are: 

1.  MARK  {data  set  name) 

2.  RESTORE  {data  set  name) 


When  a recovery  checkpoint  is  inserted  in  the  program,  say  at  edge  (i,j),  the  program 
state  is  saved  in  secondary  storage,  and  all  virtual  data  sets  are  notified  of  this 
insertion  by  the  MARK  control  command.  Upon  receipt  of  the  MARK  command, 
the  virtual  data  set  interface  process  takes  the  appropriate  action  necessary  to  ensure 
that  the  real  data  set,  at  a later  point  in  time,  may  be  restored  to  a state  of 
consistency  .vith  this  recovery  checkpoint  so  that  succeeding  transactions  may  be 
again  duplicated.  Typical  actions  taken  at  the  receipt  of  the  MARK  command  might 
include  clearing  the  previous  journal  entries,  noting  the  current  system  clock  time, 
etc.  Succeeding  interactions  between  the  program  and  the  virtual  data  set  produce 
journal  entries,  the  content  of  which  is  determined  by  the  characteristics  of  the  real 
data  set. 

Upon  the  occurrence  of  system  error,  the  program  state  is  reloaded  from  the  latest 
rollback  checkpoint,  and  the  virtual  data  sets  are  notified  of  the  error  by  the 
RE.START  command.  Upon  reciept  of  the  RESTART,  the  virtual  data  set  takes  the 
necessary  action  to  restore  its  real  data  .set,  enabling  sub.sequent  input/oiitput 
transactions  with  the  program  to  be  duplicated  and  replayed. 


2. 3. 3. 3 Classes  of  Data  Sets 

Let  us  divide  the  various  types  of  data  sets  into  equivalence  classes,  each  of  which 
possesses  common  characteristics  as  seen  by  the  virtual  data  set  interface  process. 
The  goal  of  this  classification  scheme  is  to  systematically  provide  a virtual  data  set 
which: 


1.  exhibits  the  same  operational  characteristics  as  the  real  data  set,  during 
error-free  system  operation. 
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2.  may  be  nolit'ied  of  the  insertion  of  a program  recovery  checkpoint  via  a 
MARK  command, 

3.  upon  the  receipt  of  a RESTORE  command  (at  the  occurrence  of  a system 
error)  restores  its  real  data  set  to  the  state  consistent  with  the  last  rollback 
checkpoint,  and 

4.  will  furnish  the  identical  sequence  of  input  and  output  actions  while  the 
program  is  being  rerun  during  the  recovery  process. 


Data  Sets  Requiring  No  Transaction  Journal 

The  simplest  type  of  peripheral  device  to  provide  a virtual  data  set  interface  for  is 
one  which  possesses  internal  state  which  is  not  modified  by  data  transactions,  and 
which  is  fully  repeatable.  This  type  of  device  will  be  read-only,  and  no  rollback 
state  restoration  need  be  performed,  nor  transaction  journals  generated.  A read  only 
random  access  disk  file,  or  a table  lookup  ROM  are  examples  of  this  type  of  data 
set.  The  MARK  and  RESTORE  commands  are  null  operations  for  these  devices. 


Data  Sets  Requirin.g  Backup  Journals 

Input/output  transactions  to  a data  set  with  internal  state  will  be  non-repeatable  if 
the  transaction  modifies  the  internal  state  of  the  device.  This  state  modification  may 
be  the  result  of  an  update  or  write  action,  such  as  updating  a record  in  a random 
access  disk  file:  or  it  may  the  result  of  physical  movement,  such  as  a read  operation 
which  repositions  the  input  pointer  of  a magnetic  tape  unit. 

A backup  journal  needs  to  he  generated  for  this  class  of  data  set.  The  MARK 
command  causes  the  current  state  of  the  device  to  be  entered  onto  the  journal  so  that 
subsequent  state  modifications  may  be  undone.  Each  data  transaction  which  updates 
this  state  must  generate  a journal  entry  which  can  later  be  used  to  undo  the 
modification.  Systems  exist  which  make  use  of  sophisticated  backup  mechanisms 
[24]. 
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Data  Sets  Requiring  Revoke  Journals 

This  class  of  data  set  includes  output  devices  which  interact  with  and  output 
information  to  an  external  destination,  such  as  a lineprinter,  display,  etc.  A revoke 
journal  needs  to  be  produced  for  this  type  of  device  so  that  upon  the  occurrence  of  a 
system  error,  that  output  which  was  issued  since  the  last  MARK  command  may  be 
marked  as  erroneous.  A revoke  journal  might  also  be  used  to  take  appropriate  action 
to  revoke  those  items  which  were  issued  in  error. 

The  revoke  journal  entries  must  include  all  those  output  results  which  may  be 
potentially  erroneous  upon  the  occurrence  of  a system  error.  For  a particular  device, 
if  it  is  sufficient  to  note  only  the  extent  of  the  erroneous  output  which  was  issued 
since  the  last  rollback  recovery  checkpoint,  then  a revoke  ji.)urnal  does  not  need  to  be 
generated,  and  the  interface  process  need  only  indicate  which  output  will  be  re-issued. 


Data  .Sets  Requiring  Replay  Journals 

Input  devices  which  possess  no  internal  state  and  which  are  not  repeatable,  require 
the  generation  of  a replay  journal.  Such  devices  include  keyboards,  input  samplers, 
card  readers,  and  remote  terminals.  Individual  entries  into  the  replay  journal  include 
that  information  which  is  produced  by  the  device  and  which  is  not  repeatable.  Upon 
the  occurrence  of  a system  error,  the  replay  journal  is  used  to  furnish  the  same 
sequence  of  input  data  to  the  program. 

If  the  data  set  is  used  interactively  for  both  input  and  output,  such  as  a remote  user 
terminal,  and  the  input  received  from  the  device  is  functionally  dependent  upon  the 
output  transactions  issued  to  the  device  then  the  interface  process  should  notify  the 
terminal  user,  telling  him  to  retransmit  all  messages  sent  after  the  message  with  such 
and  such  a serial  number,  which  was  the  last  correct  transaction  preceding  the  last 
rollback  checkpoint.  By  using  serial  numbers,  it  could  check  carefully  that  it  is  not 
causing  a file  to  be  updated  twice. 


When  the  user's  dialogue  is  interrupted,  it  is  sometimes  advantageous  that  he  should 
he  able  to  start  it  again  where  he  left  off.  If  it  is  a lengthy  dialogue,  for  example 
the  reordering  of  complex  machinery,  it  is  certainly  desirable  that  he  should  not  have 
to  go  back  to  the  beginning.  For  this  reason,  checkpoints  may  be  built  into  a 
dialogue  structure.  At  intervals  the  decisions  made  up  to  that  point  in  the  dialogue 
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will  be  reviewed,  and  possibly  changed.  When  the  terminal  user  agrees  that  it  is 
correct,  they  will  be  recorded. 

In  most  terminal  conversations,  there  are  certain  stages  at  which  the  set  of  decisions 
recorded  up  to  that  point  can  be  agreed  upon  as  correct.  This  can  be  regarded  as  a 
closure  in  the  decision-making  process.  Sometimes  it  is  merely  an  arbitrary  stage  in 
data  entry.  Periodicallv,  at  a natural  closure  the  user  should  be  given  a recap  of 
what  has  been  established  up  to  that  point  and  asked  to  check  it.  When  the  user  has 
agreed  that  this  rs  correct,  the  system  will  process  these  transactions. 


2.3.4  Processing  and  Storage  Overhead 

Furnishing  these  virtual  capabilities  will  incur  an  overhead  cost  due  to  additional 
processing  and  secondary  storage  requirements.  These  overhead  costs  are: 


^create  ' additional  processing  time  required  b>  the  virtual  data  set 

interface  for  the  creation  and  recording  of  the  transaction  journals. 

^storage  ' additional  secondary  memory  required  for  storage  of  the  data 
set  transaction  journals. 

^memory  ' ^id^lilional  system  memory  required  for  the  storage  of  the 
virtual  data  set  interface  process  code  and  internal  buffering. 


When  a system  error  occurs  (RFSTORF),  the  additional  processing  costs  are; 


^ backup  ” additional  processing  time  required  to  backup  the  data  set  to  the 
point  of  the  previous  MARK  command,  undoing  all  state 
modifications. 

^ revoke  * processing  cost  of  revoking  the  erroneous  output  issued. 

^ replay  ' pioccssing  cost  necessary  to  replaj  those  input 

transactions  which  are  not  repeatable. 
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For  the  subsequent  calculation  of  recovery  load  time  (Section  2.3.5),  let  us  define  a 
processing  time  cost  C'j  which  is  the  sum  of  the  backup,  revoke  and  replay  times 
which  are  incurred  during  the  execution  of  task  i.  Cj  is  the  total  per  recovery 
RESTORE  time  for  data  sets  due  to  interactions  with  task  i of  the  program: 

■ ^backup  ^revoke  ^replay 


2.3.5  Save  and  Load  Time 

The  quantity  of  program  state  (memory  in  use,  etc.)  which  must  be  saved  when  a 
recovery  checkpoint  is  inserted  may  vary  widely  from  one  point  in  the  program  to 
the  next.  Assume  that  a sufficient  amount  of  secondary  storage  exists,  so  that  it  will 
always  be  possible  to  save  the  entire  program  state.  The  secondary  storage  used  may 
be  disk,  drum,  magnetic  tape,  or  other,  more  advanced  media  [e.g.  laser  photostore]. 
The  media  used  will,  of  course,  affect  the  save  time. 

Associated  with  each  edge  (i.j)  of  the  program  graph,  as  seen  in  Fig.  2.3,  are  twi) 
numbers;  S and  L . If  the  execution  of  task  j follows  task  i,  then  the  state  of  the 

ij  ij 

system  after  task  i is  completed  and  before  task  j is  begun  is  the  collection  of  the 
registers,  program  status  and  condition  words,  primary  memory,  and  the  state  of  all 
of  the  external  data  sets.  The  time  taken  to  save  the  state  at  this  point  in  the 
program  secondary  storage,  create  a recovery  checkpoint,  and  reset  the  virtual  data  set 

interface  process  (via  the  MARK  command)  is  S . The  save  time  ,S  may  varv  over 

ij  ij 

other  edges  (i.j)  of  the  program  graph. 


Given  that  a recovery  checkpoint  was  established  on  edge  (i.j)  of  the  program  graph, 
and  that  an  error  occurs  during  the  execution  of  some  succeeding  task,  say  task  k,  the 
time  taken  to  reload  the  program  state  from  this  established  checkpoint  is  reloadjj. 

The  time  taken  for  the  data  set  interface  to  process  this  rollback  (via  the  RESTORE 
command)  is  Trp^(„rc-  time.  Ircsiore’  ''’‘•'Indes  any  special  action  which  the 

data  set  interface  process  must  take  for  handling  b.icking  up,  revoking  and  replay, 
^restore  i^ESTORE  time  fj.  over  all  nodes  i,  which  were  processed 

since  the  last  recovery  checkpoint,  i-or  example,  if  a checkpoint  were  est.iblished  at 
edge  (a.'->)  of  the  program  graph  in  Fig.  2.4.,  and  recovery  is  to  be  perforineil  during 
the  execution  of  task  f,  then: 
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^restore  " ^ 

i € (b.d.f) 

The  total  time  taken  to  restore  this  particular  system  to  consistency  is  the  load  lime, 

‘df 


Ljf  = reloadjf  + Irestore- 


2.4  Recovery  Time 


Define  the  recoyery  lime  rat  any  point  (in  particular,  at  a point  P,  when  the  error 
is  diagnosed)  in  the  program  to  be  the  interval  of  time  taken  to: 


1.  reconfigure  the  sy.s'(em  if  the  error  was  caused  by  a faulty  hardware  module, 

2.  restore  the  system  to  the  consistent  state  of  the  most  recent  recovery 
checkpoint  which  contains  the  correct  program  state,  and 

3.  rerun  the  program,  from  this  state  to  point  P. 


Thus,  if  an  error  is  detected  at  point  P,  the  recovery  time  r is  the  amount  of  time 
lost  due  to  the  error.  If  the  error  was  caused  by  a transient  hardware  fault,  then  the 
reconfiguration  time  is  zero.  If  an  error  occurs  while  task  i is  being  processed,  and 
the  error  is  detected  before  task  i is  completed,  then  the  most  recent  recovery 
checkpoint  will  contain  the  correct  system  state.  If  this  is  not  so,  the  system  needs 
to  be  rolled  back  to  a more  previous  checkpoint  (Fig  2.5,  and  Section  2.2.6). 

I he  recovery  time  r can  be  cjuickly  determined  during  pn>gram  execution.  Define 
■Sys(  lock  to  be  the  value  of  the  systerrt  clock  at  point  P (when  the  error  is 
diagnosed).  It  there  are  n previously  saved  recovery  checkpoints  on  secondary 
storage,  then  the  expected  recovery  time  r at  point  P is: 
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n 


r = R + 2 {Systlock  - C l’dockj  + l.oadj}  * I’rj 
i=l 

where: 

R - is  the  expected  time  to  recontigure  the  system  hardware,  perhaps 

amputating  a faulty  module.  If  the  error  was  caused  by  a transient 
fault  then  R = 0. 

( l’{  lockj  - is  the  svsiem  dock  time  of  the  i**’  recovery  checkpoint.  If  no 
recovery  checkpoints  were  previously  placed,  then  rPCIoekj  is  the 
value  of  the  system  clock  at  the  very  start  of  the  program  run. 

l.oadj  - is  the  time  taken  to  reload  the  program  state  from  the  i**’  recovery 
checkpoint,  and  restore  data  sets  to  ccmsislency  with  this  rollback 
checkpoint  (Section  2.3.5). 

IVj  - is  the  probability  of  the  i^^  recovery  checkpoint  containing  the 
correct  state  information. 


If  a reliable  method  of  error  detection  (Section  2.2.6)  is  performed  before  program 
tasks  complete,  then  it  is  possilrle  to  partition  our  system  so  that  errois  are  not  likely 
to  propagate  across  task  boundaries.  II  ,i  reliable  method  of  error/coiisistency 
detection  is  employed,  then  there  will  be  a high  probability  that  llie  most  recently 
established  recovery  checkpoint  will  cemtain  the  correct  state  information,  and  l’i  | + 
I’r2  + . . + Rr„.|  ~ I),  and  I’r^  ~ 1,  so: 

r = .SysCloek  - Cl’(  loek||  + l.oailj|  + R 
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2.5  Optimal  Decision  Parameter  - B 

ii 

It  is  not  generally  possible  to  predict  the  precise  amount  of  time  that  a given 
program  task  will  require.  Thus,  it  is  desirable  to  make  the  insertion  of  rollback 
points  a dynamic  procedure.  On  certain  runs  of  a program,  one  may  want  a rollback 
point  inserted  on  a particular  edge  (ij),  while  on  another  run  of  the  same  program, 
with  different  data,  it  would  not  be  optimal  to  place  a rollback  point  on  that  same 
edge.  The  decision  to  insert  rollback  points  should  be  quite  simple  so  that  it  can  be 
done  in  real  time  with  little  or  no  overhead. 

Suppose  that  at  some  point  P in  the  program  flow,  that  task  i was  just  completed, 
and  task  j is  to  be  executed  next.  Thus  we  lie  on  edge  (i.j)  ot  the  program  graph. 

Let  r be  the  recovery  time  at  this  point  P.  Define  the  optimal  decision  parameter  to 
be  B . Then,  if  r > M - B a recovery  checkpoint  should  be  inserted.  If  r < M - 

„ 'j  'j 

B choose  not  to  insert  a checkpoint.  B is  a constant,  and  the  set  of  all  B are 

'j  U ij 

computed  before  the  program  is  run.  Then,  dynamically  at  runtime,  after  task  i is 
completed  and  before  task  J is  processed,  the  recovery  time  r is  interrogated  (Section 
2.4)  and  a recovery  checkpoint  is  inserted  only  if  r > M - B . Figure  2.6  illustrates 

ij 

this  procedure.  In  general  r will  vary  from  one  run  of  the  program  to  the  next, 
because  the  time  taken  to  execute  a particular  task  will  depend  on  the  input  data  to 
the  program.  Thus,  the  insertion  of  rollback  points  also  varies  from  run  to  run, 
since  the  optimal  decision  is  a function  of  r. 

The  analysis  which  follows  in  Chapter  3 determines  the  optimal  placement  of 
recovery  checkpoints  to: 

1)  minimize  the  total  expected  program  running  time,  and 

2)  minimize  the  recovery  overhead  cost. 


These  algorithms  statically  compute  the  set  of  optimal  decision  parameters  B . 

ij 
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Task  i Just  Completed 
and  Task  j is  to  be 
Processed  Next 


Interrogate  Recovery  Time  r: 


r = R ^SysClock  - CPClock^  + Load^.  V * Pr^. 


Figure  2.6  - RUNTIME  INSERTION  OF  RECOVERY  CHECKPOINTS 


Chapter  3 

ALGORITHM  WHICH  OPTIMIZES  THE  INSERTION 
. OF  RECOVERY  CHECKPOINTS 


3.1  Purpose  of  the  MERT  Algorithm  - Minimi/,ation  of  the  Expected  Run  Time 

The  purpose  of  this  algorithm  is  to  minimize  the  expected  run  lime  of  the  modeled 
system  under  the  constraint  that  the  expected  recovery  time  must  not  exceed  a bound 
of  M time  units.  The  interval  M is  assumed  to  be  a constant,  system  defined 
bound.  From  our  graph  model  of  program  flow,  we  will  need  to  determine  the  task 
branching  probabilities  Pjj,  and  the  expected  execution  time  for  task  i,  tj. 


3.2  Estimates  of  Pro.gram  Behavior 

3.2.1  Probability  of  Occurrence  of  Task  Error 

Other  estimates  of  program  behavior  are  needed  for  this  analysis.  Associated  with 
each  task  is  qj,  the  probability  that  at  least  one  error  will  occur  during  the  processing 
of  task  i. 

This  probability,  qj,  is  primarily  a function  of  the  hardware  which  is  supporting  the 
execution  of  this  task  i.  If  the  hardware  support  cannot  be  estimated,  then  q;  might 
be  approximated  by: 

qj  = Q / n 

where  Q is  the  total  probability  of  a system  error  occurring  and  n is  the  number  of 
runnable  tasks  in  the  program. 

If  one  can  reasonably  estimate  the  amount  of  time  tj  j,  that  task  i is  spending  on 
hardware  module  a.  and  the  proh.ihility  of  failure  Q.,  tm  moilule  a,  then  a more 
reasonable  estimate  for  qj  is: 


\* 


qj  = S («i,a  / ‘i>  * Qa 
a € Hj 

where  Hj  the  set  of  all  hardware  modules  supporting  the  execution  of  task  i. 


3.2.2  Expected  Time  Until  Detection  of  a Task  Error 

Given  that  an  error  does  occur  during  the  interval  in  which  task  i is  processing,  let 
the  random  variable  y|  be  the  expected  time  between  the  initiation  of  task  i,  and  the 
detection  of  the  error.  The  parameter  >j  could  be  accurately  estimated  only  by  a 
substantial  amount  of  program  analysis  and  measurement.  If  a reliable  method  of 
error  detection  (Section  2.2.6)  is  performed  before  program  tasks  complete,  then  it  is 
possible  to  partition  our  system  so  that  errors  are  not  likely  to  propogate  across  task 
boundaries.  If  this  is  so,  yj  is  bounded  from  above  by  the  expected  task  execution 
time: 


y\  < ‘i 


3.2.3  Summary  of  Program  Behavior  Estimates 

Before  proceeding  with  the  minimizatit)n  algorithm,  let  us  briefly  recap  those 
estimates  of  program  flow  behavior  which  we  will  be  using  in  the  following  analysis; 

tj  - Expected  time  to  execute  task  i correctly,  given  that  no  errors  occur,  A 
random  variable  (Section  2.2.2). 

Pjj  - The  probability  (fixed,  independent)  that  task  j will  follow  task  i - i.e.,  that 
edge  (i.j)  of  the  program  graph  will  be  taken  (Section  2.2.4). 

qj  - The  probability  of  encountering  at  least  one  error  during  the  execution  of 
task  i. 

f-qj  - The  probability  of  executing  (ask  i succcssf iilly. 
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yj  - The  expected  time  between  the  initiation  of  task  i and  the  detection  of  an 
error,  if  one  occurs. 

Sjj  - The  time  taken  to  establish  a recovery  checkpoint  on  edge  (i,j)  (Section 
2.3.5). 

D:  - The  expected  recovery  time  if  task  i fails.  This  variable  includes: 


1.  R,  the  system  re-configuration  time. 


\ . 


2.  Cj,  the  sum  of  the  backup,  revoke  and  replay  processing  times 
which  are  incurred  during  the  execution  of  task  i (Section  2.3.4). 


M - The  maximum  bound  on  the  expected  recovery  time. 


3.3  Definition:  Kxpected  Task  Execution  Time 

I et  us  establish  F,[tj],  the  expected  execution  time  for  task  i,  given  the  probability 
of  task  error  qj. 

i.et  l [tj]  be  characterized,  by  a geometric 
00 

Fr'ii  = 2 ‘•i‘‘  {‘i  ^ ^ 

k=0 

noting  that: 

(I'tlj)  * is  il'c  probability  of  k failures  followed  by  a successful  execution  of 
task  i. 

tj  + k(R  + >j)  - is  the  recovery  time  for  k failures  plus  the  expecterl  lime  to 
successfully  execute  task  i without  errors. 


distribution,  as: 


Then: 


F[tj] 


00 

2 ‘i  + 

k=0 


00 

2 MR  + yj) 

k=0 


00 

00 

= n-‘ji)  ‘i  2 ‘ii‘‘ 

+ (l-qj)  (R  + Vj)  2 k g,k 

k=0 

- krO 

tj  + {(|j  (l-()j)  (H  + )j)}  (j/dq  ( 2 ) V 


so: 


= «i  + {*Jj  (l-g;)  (R  + y,)}  / ( l-gj  )- 


= «j  + { Mi  (R  + yj)  } / (l-Qj) 


3.4  M tiKT  A l.qorithni 


3.4.1  r h e A 1 1 \ 1 1 uixy  _ f ( r ) :uu1  i;lr)_Fi!  n c t ions 

For  each  node  i of  the  program  graph  (i.  kl  ns  define  a fimelinn  fj(r),  which  is  (he 
minimum  lolal  expected  e.xecuiion  lime  of  (he  program  from  (he  point  utter  task  i 
completes  until  the  lerminulion  ot  the  program.  This  minimum  especled  execution 
time  includes: 

1.  the  expected  program  task  execution  tune,  and 
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2.  the  time  required  to  establish  all  required  recovery  checkpoints. 


Note  that  f is  a function  of  the  expected  recovery  time  r. 

For  each  edge  (i,j)  of  the  program  graph  G,  define  a function  gjj(r),  which  is  also  a 
function  of  the  recovery  time  r.  The  gjj(r)  function  is  the  minimum  expected  total 
execution  time  of  the  program  after  task  i completes  until  the  termination  of  the 
program,  if  task  j follows  task  i.  Thus  in  the  computation  of  gjj(r)  it  is  implicit 
that  the  (i,j)  program  branch  is  being  taken.  The  fj(r)  and  gjj(r)  functions  are 
illustrated  in  Fig.  3.1. 


.T4.2  The  MFRT  Algorithm 

Define  fj(r)  = oo  for  all  nodes  i in  the  program  graph,  whenever  r is  greater  than 
the  maximum  expected  recovery  interval.  M. 


Step  0 


Define  fj(r)  = 0,  when  r < M,  for  all  exit  nodes  i of  the  program  graph  G. 
A node  with  no  successors  is  an  exit  node.  After  the  function  fj(r)  has  been 
determined  for  node  i,  consider  node  i to  be  labeled,  else  unlabeled.  Thus, 
at  this  point,  all  exit  nodes  of  the  program  graph  are  labeled. 


k*'^  Step:  (k  ^ 1,2 ) 


If  no  unlabeled  nodes  remain  on  this  k*”  step  then  terminate  the  algorithm, 
having  computed  the  set  of  optim.il  decision  parameters,  {Hjj}.  If  there  exists 
an  unlabelcd  node  i,  which  has  all  of  its  successor  nodes  labeled  (I  ig  .1.2), 
then  label  node  i by  computing  the  function  f,-(r)  as  follows: 


IF 
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1.  First,  fur  each  labeled  nude  j which  succeeds  nude  i (i.e.,  for  each  edge 
(i.j)  emanating  from  node  i)  define  the  function  Ujj(r)  lo  be: 

8ij(r)  = Sjj  + K[tj]  + fj(F[tj]  + Ljj)  if  r+K[tj]  > M 

= F[lj]  + mill  { Sjj  + fj(l  [tj]  + I,jj), 

fj(r  + K[tj])  } if  r+i;[tj]  < M. 

equation  (I) 

2.  Second,  compute  the  optimal  decision  parameter  Bjj  for  this  edge  (i.j).  Bjj 
is  defined  to  be  the  maximum  value  of  the  expected  recovery  time  r,  for 
which: 

gjj(r)  = fj(r  + l'.[tj])  + l•'[tj].  equation  (2) 

.1.  Ihird.  after  gjj(r)  has  been  computed  for  all  edges  (i.j),  compute  fj(r): 
fi(r)  = S Pjj  • Ujj(r). 

j € (i.j)  equation  (3) 


and  label  node  i,  indicating  that  the  function  fj(r)  lias  been  determined  for 
this  node. 

Note  lliat  if  fj(r)  = ”>3  for  all  r > (I.  then  we  are  not  able  to  meet  the 
constraint  of  bounding  the  expected  recovery  time  by  the  maximum  value  of 
M time  units  fur  this  chusen  graph  formulation  of  the  program,  (i. 

Repeal  the  next  (k  + 1^')  step  of  (he  algorithm. 
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3.5  ^ m m ;i : Tejjii i iia tuMi iRorith m 


The  MERT  algorithm  terminates  only  after  all  nodes  have  been  labeled. 


Proof  (by  contradiction) 

Assume  that  the  algorithm  has  terminated,  leaving  an  unlabeled  node  P.  If  the 
algorithm  terminated  with  an  unlabeled  node  P,  then  there  must  exist  a subgraph  of 
successors  to  node  P,  all  of  whose  nodes  are  unlabeled.  Since  all  subgraphs  of 
acyclic  directed  graphs  are  also  acyclic,  we  can  follow  this  path  to  an  exit  node.  But 
all  of  the  exit  nodes  were  labeled  on  the  0*^  step  of  the  algorithm.  Therefore,  by 
contradiction,  the  algorithm  could  not  have  terminated. 


3.6  Lemma:  Bounds  on  Termination 

The  MERT  algorithm  terminates  within  n steps  if  there  are  n nodes  in  the  program 
graph  G. 


Proof 

Each  step  if  the  algorithm  will  remove  at  least  one  unlabeled  node  from  the  program 
graph  G.  By  Lemma  .1.5.  the  algorithm  terminates  when  all  nodes  have  been 
labeled,  fherefore  it  terminates  within  n steps. 


3.7  Proof  tlnU  the  Mf  R T Algorithm  Mmimi/es  the  Expected  Run  finig 

Proof  by  induction,  that  on  the  k*’’  step  of  the  algorithm,  the  value  determined  for 
Bjj  is  the  optimal  decision  parameter  described  previously  in  Section  2.5. 

3.7.1  Case_k=l 

Eor  each  node  i which  is  labeled  on  the  first  step,  we  have  the  following  situation  of 
l ig.  .3.3.  in  which  all  nodes  j.  such  th.it  the  edge  (i.j)  exists,  arc  exit  nodes. 
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Since  fj(r)  = 0 for  all  of  these  exit  nodes  j,  equation  (1)  reduces  to; 

8ij(r)  = Sjj  + F.[tj]  if  r+F[tj]  > M 

= E[tj]  if  r+F.[fj]  < M 

Subcase:  r -t-  Eftjl  > M 

If  r + F[tj]  > M.  a recovery  checkpoint  must  be  inserted  on  edge  (i,j),  otherwise  the 
recovery  time  after  task  j completes,  r + F[tj],  may  possibly  exceed  the  maximum 
bound  of  M time  units.  If  a rollback  point  must  be  inserted  on  edge  (i.j),  the 
minimum  expected  execution  time  Rjj(r)  at  this  point,  after  the  completion  of  task  i, 
until  the  end  of  the  program,  is  the  sum  of; 

1.  The  expected  task  execution  time  for  task  j,  established  previously  in  Section 
3.1.2:  F[tj]. 

2.  The  time  taken  to  insert  the  rollback  point  on  edge  (i,j):  .Sjj.  This  includes 
saving  the  program  state  and  those  action  which  the  data  set  interface  processes 
takes  upon  receipt  of  the  MARK  command. 

Thus, 

fiij(r)  = F,[lj]  + Sjj. 


.'siihe,i'a':_r_t  Eftjl  ^ M 

If  r + F[tj]  < M,  a recovery  checkpoint  need  not  be  inserted  on  edge  (i,j).  Since 
node  J is  an  exit  node,  the  minimum  expected  execution  time  gjj(r)  is  the  expected 
execution  time  for  task  j: 

Uij(r)  = E[«j]- 
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The  optimal  decision,  Bjj,  for  this  case  (k=l)  when  node  j is  an  exit  node  is  to  not 
insert  a rollback  point  on  edge  (i,j)  as  long  as: 

r + f [‘j]  < M 


Thus.  Bjj  for  this  particular  edge  (i.j)  is: 

B|j  = F.[tj] 

Note  also  that  Bjj  is  the  maximum  value  of  r for  which: 

gjj(r)  = fj(r  + I [tj])  + K[tj]. 

Computation  of  fj(r) 

The  minimum  expected  time  spent  in  program  execution,  after  task  i completes  until 
the  termination  of  the  program,  is  then  the  weighted  sum  of  the  gjj(r)  averaged  over 
all  program  branches  (i.j)  emanating  from  this  node  i: 

fj(r)  = S Pij  ■ 8ij(0 

J e (i  i) 

Thus,  the  MERT  algorithm  is  true  for  the  case  k = 1. 


3.7.2  Induction  .Step 

Assume  that  the  algorithm  is  true  for  k = 1,2,  ...  x-1.  I’rove  it  to  be  true  for  the 
case  k = X. 
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At  this  point  we  are  at  an  arbitrary  edge  (i.j)  of  the  program  graph  G.  as  illustrated 
in  Fig.  3.4. 

If  a recovery  checkpoint  is  inserted  on  edge  (i.j)  immediately  before  the  execution  of 
task  j,  then  the  expected  recovery  time  after  task  j completes  is: 

K[tj]  + Fjj. 

If  the  expected  recovery  time  after  task  j terminates  is  F.[tj]  + Ljj,  then  by  the 
induction  hypothesis,  the  minimum  expected  execution  time  after  task  j completes, 
until  the  end  of  the  program  is: 

fj(F.[tj]  + Ljj) 

Thus  the  total  minimum  expected  run  time,  after  task  i completes  and  if  task  j is 
executed  next,  is  the  sum  of: 

1.  the  time  taken  to  establish  the  recovery  checkpoint,  Sjj,  plus 

2.  the  expected  time  to  execute  task  j,  F,[tj]  plus 

3.  the  minimum  expected  execution  lime  after  task  j completes.  fj(F[tj]  + 
>-ij) 

So: 


Uij(r)  = Sjj  + F[tj]  + fj(F[tj]  + Ljj) 


If  a recovery  checkpoint  is  not  inserted  on  edge  (i,j).  the  expected  recovery  lime 
immediately  after  task  j terminates  is: 

r + F.[tjl 

If  the  expected  recovery  time  is  r + F[tj],  then  by  the  indiiclion  hypothesis,  the 
minimum  expected  execution  time  after  the  completion  of  task  j is; 

fj(r  + F[tjl), 
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The  total  minimum  expected  run  time  after  task  i completes,  if  task  j is  executed 
next,  is  the  expected  time  taken  to  execute  task  j plus  the  minimum  expected 
execution  lime  after  task  j completes: 


gjj(r)  = F[tj]  + fj(r  + E[lj]). 


Subcase:  r Eftj1  > M 

If  r + E[tj]  > M,  a rollback  point  must  be  inserted  on  edge  (i,j)  if  the  expected 
recovery  time  after  task  j is  constrained  not  to  exceed  our  bound  of  M time  units. 
Thus: 


«ij(r)  = Sjj  + F,[tj]  + fj(F[tj]  + I-jj) 


Subcase:  r + Kftjl  < M 

If  r + F[tj]  < M,  then  the  option  exists  of  not  inserting  a rollback  point  on  edge 
(i,j).  If  It  is  not  inserted,  the  minimum  expected  run  time  of  the  program  is: 


F[fj]  + fj(r  + 


If  a rollback  point  is  inserted,  the  minimum  expected  run  time  is: 


Sij  + F[fj]  + fj(l  [Ij]  + Ljj) 


.So,  picking  the  method  which  minimi/es  the  expected  execution  time: 


«ij(r)  = F[tj]  + min  { .Sjj  + fjd  [Ij]  + l-jj), 

fj(r  + F[tj])  } 


% 


The  optimal  decision  for  the  edge  (ij)  is  to  not  insert  the  recovery  checkpoint  as 
long  as  the  minimum  expected  execution  time  after  processing  task  j until  the  end  of 
the  program: 

fj(r+F.[tj]) 

plus  the  expected  execution  time  for  task  j: 


F[tj] 

equals  the  minimum  expected  execution  time  after  the  processing  of  task  i,  along  the 
(i,j)  program  branch: 

Sij(r) 

Itjj  then  is  that  maximum  value  of  r,  below  w'hich: 

J5jj(r)  = fj(r  + K[tj])  + I [tj]. 

This  is  the  value  of  Bjj  which  minimizes  the  expected  execution  time. 


Oj mp utation  of  f j (r) 
Again 


•■i(f)  = 2 Pij  ■ 

J e (i.j) 

fj(r)  is  the  expected  run  time  r)f  the  program  after  task  i cimipletes,  averaged  over  all 
branches  (i.j)  emanating  from  node  i. 

g.i  .1). 
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Chapter  4 

EXAMPLE  ILLUSTRATING  THE  DYNAMIC  INSERTION  OF 
RECOVERY  CHECKPOINTS 


This  chapter  will  attempt  to  illustrate  the  use  and  implementation  of  the  previously 
developed  MERT  algorithm.  A graph  model  of  a typical  sample  program  segment  is 
analyzed  by  the  MERT  algorithm,  and  the  static  optimal  decision  parameter  set 
{Bjj}  is  computed.  Then,  using  the  recovery  checkpoint  insertion  procedure  of 
Section  2.5,  the  run-time  behavior  for  this  program  graph  is  examined  for  varying 
conditions  of  execution  parameters  and  error  conditions. 


4.1  Graph  Model  of  a Typical  Program 

A graph  model  of  the  typical  program  is  shown  in  Fig.  4.1.  There  are  seven  distinct 
tasks  in  the  program.  For  simplicity,  the  Load  and  Save  time  for  all  branches  (Ljj, 
Sjj)  are  are  fixed  for  this  example  at  Ljj  = 3,  Sjj  = 4 for  all  branches  (i,j).  Any 
iterations  (i.e.,  FOR,  WHILE,  or  DO  Loops)  in  the  program  have  been  coalesced  into 
a sequence  of  statements  contained  in  one  task.  The  other  parameters  for  this 
program  are: 

tj  - The  expected  time  to  execute  task  i correctly  (Section  2.2.2). 

yj  - The  expected  time  between  the  initiation  of  task  i,  and  the  detection 

of  an  error,  if  one  occurs  (Section  3.2.2). 

qj  - The  probability  of  encountering  at  least  one  erri>r  during  the  execution 
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Figure  4.1  - GRAPH  MODEL  OF  SAMPLE  PROGRAM 


( 
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Pjj  - The  probability  (fixed  and  independent)  that  task  j will  follow  task  i 
(Section  2.2.4). 

Also,  assume  that  task  5 issues  output  to  a lineprinter  type  data  set  (the  virtual  data 
set  interface  process  described  in  Section  2.3.3). 

The  maximum  expected  recovery  time,  M,  is  30  time  units  duration,  and  the  system 
reconfiguration  time,  R,  is  2 time  units.  The  values  given  in  Fig.  4.1  for  (the 

expected  task  execution  time  for  task  i)  are  computed  (Appendix  A)  from; 

K[tj]  = tj  + {(R  + yj)  qj}  / (1-qj)  (Section  3.3) 


4.2  n 1 p u tat _i p ri jx f the  Opt i mal  Decision  Parameter  Set 

fhe  optimal  decision  parameter  set  {Bjj}  for  the  program  represented  by  Fig.  4.1  is 
now  obtained  by  application  of  the  MERT  algorithm  which  was  presented  in  Section 
3.4.2. 

The  MFRT  algorithm  has  been  coded  as  a RCPL  program  in  Appendix  A.  The  input 
data  to  this  analysis  program  for  our  sample  program  is  shown  in  Appendix  B,  and 
the  output  results  obtained  by  the  MERT  analysis  (the  fj(r),  gjj(r),  and  the  set  {Bjj}) 
are  given  in  Appendix  C. 

i 

Applying  the  MERT  algorithm  (.Section  3.4.2)  to  the  sample  program  graph  in  Fig.  | 

4.,.  j 

i 

Step  0 I 

On  this  initial  step  of  the  algorithm  we  define  fj(r)  = 0.  for  r < M for  all  exit  | 
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nodes  of  the  program  graph.  In  our  sample  program  there  is  onl>  one  exit  node, 
node  7.  So; 


filr)  = I 0,  for  0<r<30, 

I 00,  for  30<r. 

and  mark  node  7 as  being  labeled  (i.e.,  f7(r)  has  been  determined). 

Step  1 

Node  7 IS  now  labeled,  and  nodes  2,  4,  5 and  6 are  unlabeled,  having  had  all  their 
successors  labeled.  Compute  g27^'^^  *^27  equations  (1)  and  (2). 

U27('')  ( = ■‘'27  * ^['7]  + f7(f  [l7]  1-27^  r+l.[t7]>M, 

I = I.[t7]  + min  {,S27  + f7(f  [t7]  + 1-27)- 

/ f7(r  + l [l7])}  if  r-rl.[l7]<IV1. 

subsliluiing  into  equal  ion  (1). 

.*• 

If  r+l  [t7]>M  (i.e.,  r>20)  then; 

.S27  l'-[l7]  f7(f  [l7]  ^'27^  = 4 -r  10  f7(10-*-3) 

= 14  f7(l3)  = 14. 

If  r-fK[t7]<M  (i.e.,  r<20)  then; 

I’[t7]  + min  {.S27  + f7(l'[i7l  + *'27^’  *^7^*^ 

= 10  + min  {4  + f7(l3),  f7(r  * 10)} 

= 10  + min  {4,0}  = 10 


so; 
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T 


S27(r)  = I 

r- 


for  0<r<20, 
for  2(Kr<30. 


f 


A 


i 

t 

k 

} 

i 


Note  that  g27(r)  = f7(f+F^[*7])  + ^[*7]  = f7(r+>0)  + >0  for  all  r<20,  so  using 
equation  (2)  we  find  our  first  optimal  decision  parameter; 

B27  = 20. 


Since  there  is  only  one  edge  emanating  from  node  2,  we  can  compute  f2(r)  from 
equation  (3)  as: 


f2<'')  = P27  827<^>  = 827('')- 


1 10, 

for  0<r<20, 

^2(^)  = 

( *4, 

for  20<r<30, 

1 00, 

for  30<r. 

label  node  2, 

indicating  that  f2(r)  ha; 

5 been  determined 

Steps  2,  3,  4 


Now  compute  g47<r),  g57(r),  ^67^*^)  **47’  **57’  ’'"'*  **(>7  *-Kmg  the  same 

procedure  as  above,  we  determine  that: 

g47(r)  = U57(r)  = U(,7(r)  = i 10,  for  0<r<20, 

I 14,  for  2(Ki<30, 
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'1 


and: 

! . 


[ 

! 

«47  = B57=  »67  = 20. 

1 

i- 

i 

Again,  since  there  is  only  one  edge  leaving  nodes  4, 

5 and  6,  we  can  compute  f4(r), 

1 

1 

f5(r)  and  f^Cr); 

' *> 

1 ■ 

1 

; 10, 

for  0<r<20, 

1. 

t4(r)  . f5(r)  = f^(r)  = 

for  20<r<30, 

' 

1 

[ 00. 

for  3()<r. 

1 '■  »■ 

i 

and  label  nodes  4,  5 and  6. 

Step  5 

Now,  sitice  node  3 is  unlabeled,  and  all  its  successor  nodes  (4.  5 and  0)  are  labeled, 


we  can  compute 

g^4(r)  and  1134  from  equations  (1)  atid  (2); 

K34(r) 

( - '^34  ^ •■['4]  + f4U‘[(4]  + 1.34) 

if  r+l  [t4]>M, 

\ = K[(4]  + min  {S34  + f4(l  [t4]  + F34), 

f4(r  + K,[t4])} 

if  r+F[t4]<IVI. 

If  r+F[t_j]>M  then; 


i 

r 

I 

i 


S.U  + I LI4]  + f4(l  II4]  + 1-34)  = 4 + 16  + f4(l6+3) 
= 2(1  + f4(l‘>)  = 20  + 10  = 30 

If  r + l [t4|<M  (i  e„  r<4)  then: 

f4(r  + ir(4l)  = f4(r+l6)  = 10  for  r<4 
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834(0  = I 26, 
) 30, 


for  4<r<30, 


Now  note  that  g34(r)  = f4(r+r.[t4])  + F,[l4]  = f4(r+16)  + 16  for  all  r<14,  so  using 
equation  (2)  we  find  that: 


Bia  = 14  . 


Now  compute  g^5(r),  and  B35  . Using  the  same  method  as  above,  we  find  that: 


B3S(f)  = ( 16, 

(20, 


for  0<r<  14, 
for  14<r<30, 


R35  = 24. 


And  for  B35: 


836(0  = ( ’2, 

(2'^ 


for  0<r<  13, 
for  13<r<30. 


Bar,  = 23. 


Now,  since  Kij(r)  has  been  computed  for  all  edges  leaving  node  3 (i.e.,  g34(r),  g35(r) 
and  836(r)  ).  we  can  use  equation  (3)  to  compute 


f3(0  = 2 Pjj-  Sij(0 
j€  {4,5,6} 


equation  (3) 


where  P34  = .10,  P35  = .30  and  P35  = .60,  and  gjj(r)  are  those  given  above.  This 
computation  yields  (See  Tig.  4.2  for  a pictorial  representation  of  this  calculation)  : 


f3(r)  = 


17, 

for 

0<r<4, 

18, 

for 

4<r<13, 

20, 

for 

VI 

21, 

for 

14<r<30, 

00, 

for 

30  <r. 

and  label  node  3.  indicating  that  f3(r)  has  been  computed. 


Now  compute  g|2(r)i  f!|3('’).  Kl4('‘)  and  H|2,  Bjj,  and  B14.  Using  the  same 
procedures,  we  find  that; 


«12(0  = ) 34, 
)38, 


for  0<r<l0, 
for  I0<r<.30. 


B,7  = 10 


g|3(^)  = 


24, 

for  0<r<7, 

26, 

for  7<r<8, 

27, 

for  8<r<24, 

28, 

for  24<r<30, 

Bn  = 24  . 


Si4(r)  = I 26, 

( 30, 


Bn  = 14. 


for  ()<r<4, 
for  4<r<30, 


Now,  since  gi2(f).  S|3(«’)  and  g|4(r)  have  been  computed,  v\e  can  calculate  fj(r). 
Using  equation  (3)  we  find  that: 


f|(r)  = 


29, 

for  0<r<4, 

30, 

for  4<r<7, 

31, 

for  7<r<  10, 

33, 

for  I0<r<30, 

for  30  <r. 

and  label  node  1. 
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On  this  final  step  of  the  MERT  algorithm  we  find  that  all  of  the  nodes  are  labeled. 
The  algorithm  terminates,  having  computed  the  complete  optimal  decision  parameter 
set  {Bjj}. 

These  results  are  summarized  in  Fig.  4.3. 


4.3  Example  Execution  of  Sample  Program 

In  the  following  examples,  suppose  that  the  given  run  of  the  sample  program 
(represented  by  Fig.  4.1)  executes  tasks  1,  3,  5 and  7 sequentially.  Also,  assume  that 
the  program  is  initially  loaded  in  time  Eqo  = 1. 

From  f|(r)  (Section  4.2)  we  find  that  the  minimum  total  expected  execution  time  for 
the  program,  including  the  time  for  placement  of  any  required  recovery  checkpoints  is: 

F[l|]  + f,(F[t|]  + l.„(,)  = 13  f|(l3  + I)  = 13  + 33  = 46. 

Ihe  following  examples  make  use  of  the  checkpoint  insertion  algcrrithm  presented  in 
Section  2.5  (and  Fig.  2.6). 


4 3.1  Fx.imple  I 

Suppose  that  in  this  given  run  of  the  program  that  the  task  excciiiion  times  ,ire: 


OPTIMAL  DECISION 
PARAMETER  SET 


^12 

10 

^13 

24 

^14 

14 

^27 

20 

^47 

20 

^57 

20 

^67 

20 

^34 

14 

^35 

24 

^36 

23 

Figure  4.3  - TABLE  OF  MERT  COMPUTED  OPTIMAL  DECISION 


PARAMETERS  FOR  SAMPLE  PROGRAM 


task  1 = 1 
task  3 = 2 
task  5=3 
task  7 = 5 

After  task  1 completes,  we  interrogate  the  recovery  time,  r: 

r = R + SysClockj  - CPCIockj  + Loadj  ( =Lqq) 

= 2 + (I  - 0)  + 1 = 4 

and  r < M - = 30  - 24  = 6,  so  we  do  not  insert  a recovery  checkpoint.  Process 

task  3. 

After  task  3 completes,  again  interrogate  r: 
r = 2 + (l  + 2)+l=6 

r < M - Bjg  = 30  - 24  = 6,  so  no  recovery  checkpoint  is  placed.  Process  task  5. 
After  task  5 completes,  interrogate  r: 
r=2  + (l  + 2 + 3)+I=9 

and  r < M - = 30  - 20  = 10.  so  go  on  to  process  task  7. 


Thus,  for  the  execution  characteristics  of  this  particular  example,  it  was  m)l  necessary 
to  insert  any  recovery  checkpoints. 
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On  this  run  of  the  program,  the  task  execution  times  are: 


P 


I 

I 

! 


task  1 = 2 
task  3 = 5 
task  5=4 
task  7=5 

After  task  1: 

r = 2 + (2  - 0)  + I = 5 

r < M - = 30  - 24  = 6.  so  go  on  to  process  task  3.  After  task  3 completes, 

again  interrogate  r: 

r = 2 + (2  + 5)  + 1 = 10 

r > M - II35  = 30  - 24  = 6,  so  we  insert  a recovery  checkpoint  on  edge  (3,5)  (which 
consumes  Sj5  = 4 time  units).  Now  process  task  5.  After  task  5 completes, 
interrogate  r: 

r = R + 4 + 1.35  = 2 + 4 + 3 = 9 

r < M - IJ5-7  = 30  - 20  = 10,  so  go  on  to  process  task  7. 

Thus,  in  example  2,  one  dynamic  recovery  checkpoint  was  placed  hetween  the 
execution  of  task  3 and  task  5. 


) 

i 
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On  this  run  of  the  program,  let  the  task  execution  times  remain  the  same  as  above 
(Example  2); 


task  1 = 2 
task  3 = 5 
task  5 = 4 
task  7=5 


But  on  this  run.  an  error  is  detected  during  the  processing  of  task  5. 


When  the  error  is  detected,  we  roll  back  to  the  previously  placed  checkpoint  which 
preceded  task  5,  (i.e.,  the  one  which  was  placed  on  edge  (3,5),  and  backup  the  virtual 
data  set  which  interfaces  with  task  5 (via  the  RESfOKE  command  which  will  cause 
the  output  issued  by  task  5 to  be  revoked).  This  reloading  process  takes; 

R + 1>35  (=  rcloadjj  + Ifestorc^  = 2 + 3 = 5 time  units. 

Assuming  that  the  system  reconfiguration  (R)  fixed  the  faulty  module  which  caused 
task  5 to  fail,  ami  that  this  reloaded  checkpoint  contained  the  correct  state 
information,  we  find  that  task  5 now  executes  successfully.  After  task  5 finishes: 

r = R + 4 + 1,^5  = 2 + 4 + 3 = 9 

I 

and  r < M - B^-^  = 10,  so  continue  on  to  process  task  7.  I 
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4,4  Summary 


1 

I * 

f 

I . 


' V 


In  this  chapter,  the  computation  of  the  set  {Bjj}  and  its  execution  time  use  have 
been  demonstrated.  It  has  been  shown  that  the  optimal  insertion  of  recovery 
checkpoints  is  a dynamic  procedure,  and  is  a function  of  the  runtime  characteristics 
- input  parameters,  actual  task  execution  times,  etc.  - of  the  program.  Given  that 
the  set  {Bjj}  has  been  determined,  the  insertion  of  recovery  checkpoints  requires  only 
a minimal  runtime  computational  overhead. 

This  thesis  has  described  a recovery  method  which  guarantees  that  a computer  system 
and  its  asociated  data  sets  will  be  restored  to  an  operational  and  consistent  state 
within  a given  amount  of  time,  minimi/ing  the  total  overhead  cost  of  creating 
recovery  checkpoints. 
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Appendix  A 

THE  MERT  ANALYSIS  PROGRAM 


//  Program  for  computation  of  MERT  algorithm 
//  Input  parameters  stored  on  file;  G.IN 


get  "STREAMS.!)"  //  Stream  Definitions 

external 

[ //  system  routines 

Gets 

keys 

Puts 

Wss 

Ws 

OpenFile 

Closes 

DeleleFde 

J 

manifest 

[ 

nmax  = 100  //  max  number  of  nodes 
emax  = 100  //  max  number  of  edges 
fmax  = 300  //  breakpoint  storage  in  function  list 
infinity  = ttlllll  //  our  16-bit  infinity 
] 

static 


[ 


N 

// 

node 

list 

1 

// 

edge 

list 

E 

// 

function  list 

diskin 

// 

disk 

input  file 

diskout 

// 

disk 

output  file 

nodes 

// 

miml 

her  of  nodes 

edges 

// 

miml 

her  of  edges 

M 

// 

m;ix 

recovery  time 

R 

// 

ro-s» 

mfiguration  time 

tnext 

// 

i rec 

pointer  into  Eunction  list  chain 

1 

slriicliire  vlring: 

I //  string  template 

length  byte 
chart  1,355  byte 


] 

structure  node: 

[ //  each  entry  correspr)nds  to  one  node  m the  graph 

indxtl.nmax: 

[ 

count  word  // 
pi  ink  w ord  // 

slink  word  // 

li  word  // 

Eti  word  // 
yi  w'ord  // 
qi  word  // 
tiink  wt)rd  // 

// 

labelled  word 

] 

] 

structure  edge: 

[_  //  e:ieh  etiiry  corresponds  to  one  edge  in  the  graph 

indxt  l.emax: 

[ 

pred  word 
pi  ink  word 

suec  word 
slink  word 

load  word 

save  word 

pij  word 
glink  word 

Bij  word 

] 

] 

structure  fuiution: 

[ //  each  entry  corresponds  to  a brcMkpoint  ot  the 

//  I or  g f unction 
indxt  l.finax: 

[ 

X word  //  abscissa  of  discontinuity 

y word  //  ordinate  of  disconlimiity 

xylink  wold//  points  to  the  next  bieakpoint 

1 


//  predecessor  node 
//  linked  list  (in  bOCifi)  of 
//  predecessors 
//  successor  node 
//  linked  list  (in  I Dtib)  of 
//  successors 

//  load  time  for  this  (pred  =>  suec) 

//  edge 

//  save  time  for  this  (pred  =>  succ) 

//  edge 

//  probability  of  taking  this  program  branch 
//  pointer  to  f irst  element  ol  the  g 

//  function  in  HJNCIION  list 
//  decision  variable  B(i,.j) 


number  of  successors  to  this  node 
list  of  predecessors  - pointer  to 
//  EDGE  list 

list  of  successors  - pointer  to 
//  EDGE  list 

expected  running  time  of  this  node 

expected  task  execution  time  - compufed  by  DetEti 

expected  time  imiil  ileteclion  of  error 

probability  of  error  occurring  (per  cent) 

pointer  to  first  element  of  the 

f function  in  EUNGIION  list 

//  this  vertex  is  labelled 
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Ift  Maiti  ()  be 
[Main 


//  Create  the  Node,  Edge  and 
let  V = vec  (size  node)/ 16; 
let  V = vec  (size  edge)/ 16; 
let  V = vec  (size  function)/16; 


Function  storage  areas 
N = V 
E = V 
F = V 


Ws  ("‘cStart  of  Program*c")  //  Initialization  on  screen 


diskin  = OpenFile  ("G.IN",  ksTypeReadOnly,  charltem) 
if  diskin  eq  0 then 

[ //  Input  file  not  on  directory 

Ws  ("*cFile;  G.IN  not  on  disk") 
goto  FIN 
] 

DcleteFile  ("G.OUT”) 

diskout  = OpenFile  ("G.OUT",  ksTypeReadWnte,  charltem) 


Filllists  (diskin,  diskout) 

I'nntlists  (diskout) 

Men  0 //  The  MFRT  Algorithm 

I’nntlists  (diskout) 

FIN:  Wrapup  () 

]Main 


and  Mert  ()  be 
[Mert 

//  The  MERT  Algorithm  - Chapter  3 

//  initialize:  for  all  terminal  nodes:  F(r)  = 0 for  r le  M 

for  I = 1 to  nodes  do 

[ 

if  N > >node.mdx T i. count  ne  0 then  loop 
N>>node.iiuixti.flink  = fnext 
f nterxy  (M,  mfini(\,  0;  //  store  breakpoints 
N > >node,ind\  1 i. labelled  = true  //  label  exit  node 
I’rintF  (i,  0,  N > >node.mdx t i.flink,  diskout) 

1 


loop: 

//  look  for  an  unlabelled  vertex  which  has  all  of  its  successors 
//  labelled 

let  found  = false 
for  I ::  I to  nodes  do 
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[ 

if  N>>node.inclxti, labelled  then  loop 
let  sueeessorlabelled  = true 
let  ptr  = N>>node.indxti.slink 
while  ptr  ne  0 do 
[ 

let  j = F> >edge.indxTptr.succ 
if  not  N > >iu)de.mdx Tj. labelled  then 
[ 

sueeessorlabelled  = false 
break 

J 

ptr  = L>>edge.indxtptr.slink 

] 

if  not  sueeessoi labelled  then  loop 

//  at  this  point  we  know  ihai  all  of  the  successors 
//  to  node  i are  labelled 

found  = true 

let  ijplr  = N>>node.indxti.slink 
while  ijptr  ne  0 do 
[ 

//  for  all  edges  j (i  =>  j)  compute  Gij(r) 
let  J = l■>>edge.indxtiJptr.succ 

Wss  (diskout,  "’cVertex  i is  unlabelled  and  vertex  j is 

labelled  (i,j)=  ") 

Wds  (diskout,  i) 

Wds  (diskout,  j) 

DetGij  (i,  J,  ijptr) 

I’rintF  (i,  J,  h>>edge.indxTijptr.glink,  diskout) 


//  now  set  Bij 

K>>edee  indxt ijptr. Bij  = DetBij  (i,  j,  ijptr) 
f //  print  Bij 

Wss  (diskout,  "*cBij.  (I,  J,  Bij)  ") 

Wds  (diskout,  i) 

Wds  (diskout,  j) 

Wds  (diskout,  F>>edge.indxTijptr.Bij) 

Ijptr  = F>>edee.indx  t ijptr.slink 

] 

//  now  compute  f i(r) 
let  slop  = DetF  i (i) 

I’riiitf'  (i,  0,  N>>node  indxri.flink,  diskout) 

//  label  node  i to  show  that  F i(r)  h,is  been  computed 


'1'  ' 


r. 


1 


I 

i 


5') 


N>>node.indxti. labelled  = true 


if  stop  then 

[ 

Wss  (diskout,  "*cFi(r)  = infinity  for  0 le  r le  M") 
Wss  (diskout,  "*cNode  = ") 

Wds  (diskout.  i) 

return 

] 


] 

//  try  to  find  another  node  whose  successors  are  all  labelled 

if  found  then  goto  LOOP 

]Mert 


and  DetGij  (i,  j,  ijptr)  be 
[DetGij 

//  determine  Gij(r) 

// 

//  r+E[tj]  gr  M;  Gij(r)  = Sij  + F[tj]  + fj(E[lj]  + Lij) 

//  r+E[tjJ  le  M;  = E[tj]  + min  { fj(r+E[tj]). 

//  Sij  + fj(E[tj]  + l.ij)  } 

let  Etj  = N>>node.indxtj.Eti 
let  bp  = M-Etj  //  breakpoint 

let  Lij  = E>>edge.indxTijptr.load 
let  Sij  = E>>edge.indxt ijptr. save 

let  Ejptr  = N>>node.indxtj.flink 
let  si  = Sij  + Evalfg  (Ejptr,  Lij+Etj) 

//  now  construct  Gij(r) 

let  ysave  = 0 
let  sp  = 0 

E>>edge.ind\tijptr.glink  = fnext 


for  r = 0 to  bp  do 

[ 

let  yval  = Evalfg  (Ejptr.  r + Etj) 
if  yval  gr  si  then  yval  = si 
if  (yval  + Et|)  ne  ysave  then 
[ 

if  sp  ne  0 then  E>>fimction  iiulxrsp.xylink  = fnex 
sp  = fnext 

I nterxy  (r,  yval  + I tj,  0) 
ysave  = yval  + Etj 


//  now  enter  breakpoint  value  if  it  isn't  there 
if  (si  + ttj)  ne  ysave  then 
[ 

if  sp  ne  0 then  F>>function.indxTsp.xylink  = fnext 
sp  = fnext 

Fnterxy  (bp,  si  + Ktj,  0) 

] 

F>>fnnction.indxtsp.xylink  = fnext 
Enterxy  (M,  infinity,  0) 

JDetGij 


and  DetFi  (i)  = valof 
[DetFi 


//  Determine  fi(r)  = SUM  over  all  edges  (i.j)  of  C;ij(r)*pij,  r le  M 
let  stop  = false 

let  ijptr  = N>>node.indxTi. slink 
N>>node.indxti,flink  = fnext 


let  ysave  = 0 
let  sp  = 0 
for  r = 0 to  M do 
[rioop 

let  infinflag  = false 
let  ysum  = 0 
let  flag  = false 

let  ijptr  = N>>node.indxTi.slink 
while  ijptr  ne  0 do 

[ //  pij  * gij(r) 

let  Gijptr  = F>>edgo.indxT ijptr. glink 

let  pij  = F>>edge.indxTijptr.pij 

let  Gval  = Fvalfg  (Gijptr,  r) 

if  Gv.il  et|  infimtv  then  infinflag  = true 

>sum  = ysum  + (Gv;il  * pij) 

Ijptr  = F>  >edge.indx  f Ijptr. slink 

] 

ysum  = ysum  / 100  //  Normali/e;  pij  in  per  cent 


ist 


if  ml  ml  lag  then  ysum  = infinity 
if  ys.ive  ne  ysum  then 

[ //  Fnter  this  breakpoint  into  the  funelion 


il  sp  ne  0 then  l■>>funellon.lndxtsp.\yllnk  = fnext 

sp  = fnext 

fnterxy  (r.  ysum,  0) 


1 


\ 


if  ysave  eq  infinity  then 

if  r Is  M then  stop  = true 

]rloop 

resultis  stop 
]DetFi 


and  DelBij  (i,  j,  ijptr)  = valof 
[DetBij 

//  Compute  Bij  such  that: 

//  gij(r)  = fj(r  + K[lj])  + E[tj]  for  r le  Bij 

let  Etj  = N>>node.indxtj,Eti 

let  Gijptr  = E>>edge.indxTijptr.glink 

let  Ejptr  = N>>node.indxTj.flink 

for  r = 0 to  M do 

[ //  loop  while  Gij  (r)  = FJ  (r  + E[tj])  + F[tj] 

if  (Evalfg  (Fjptr.  r + Etj)  + Etj)  ne 

Evalfg  (Gijptr,  r)  then  resultis  r 

] 

resultis  M 
]DetBij 


and  DetEti  (i)  be 

[DetEti  j 

i 

//  Determine  E[ti]  for  node  i:  , 

//  F[ti]  = ti  + ( (R+yi)*qi  / (1-qi)  ) ’ 

let  ti  = N>>node.indxti.ti  - 

let  yi  = N>>node.indxti.yi 

let  qi  = N>>node.indxti.qi  //  note  that  qi  is  in  % 

let  Eti  = ti  + (((R+yi)*qi)  / (100  - qi)) 

N>>node.mdxti.Eti  = Eti 
JDetEti 


and  Filllists  (strmit).  strmout)  be 
[Filllists 

//  create  the  EIXiE  and  NODE  lists  from  the  graph  model 
//  Input  parameter  tile  (strmm)  on  file:  G.IN 

Wss  (sirmoiit,  "*cl  liter  the  Maxiimin  Recovery  Time  M:  ") 
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M = Getnuni  (strmin,  strmoul) 

Wss  (strmoul,  "*LtiUcr  the  Re-cont  iguration  Time  R:  ") 

R = Getnum  (strmin,  sirmout) 

Wss  (sirmout,  "*cKnler  the  number  of  nodes  in  graph:  ") 
nodes  = Getnum  (strmin,  strmoul) 
edges  = 1 

for  i = 1 to  nodes  do 
[iloop 

Wss  (sirmout,  "*cEnter  (Ti  Yi  Qi(%)  for  node  ") 

Wds  (sirmout,  i) 

Wss  (sirmout,  ") 

N>>node,indxTi.ii  = Getnum  (strmin,  sirmout) 

N>>node,indxti,>i  = (ielnum  (strmin,  strmout) 

N>>node,indxri,qi  = Getnum  (strmin,  strmout) 

//  Determine  E[ti] 

DetEli  (i) 

Wss  (sirmout,  "For  each  successor  to  node  ") 

Wds  (sirmout,  i) 

Wss  (strmout,  " enter:*c") 

Wss  (strmoul,  " (Successor  Node,  Load  Time,  Save  Time,  Pij(%) 

)*c") 

let  cm  = 0 
let  tpij  = 0 
while  true  do 

[edgeloop 

Wss  (strmout,  " ") 

let  Slice  = fictiuim  (strmin,  strmoul) 
if  succ  cq  0 then 
[ 

if  tpij  ne  101)  then  Wss  (strmout, 

"*cSum  of  pij's  neq  100  **»♦•*") 
break 
] 


cut  = cut  +1 

F>>cdge.iiul\tedges.succ  = succ 
F > >cdec  iiulx  Tedges.prcd  = i 

h>>cdgc  iiKlxtedgcs.lo.id  = (ietnum  (strmin,  strmout) 
F>>cdge.iiulvTedgcs,sase  = Getnum  (strmin,  sirmout) 
F > >cdgc.iiKl\ tedgcs.pij  = Getnum  (strmin,  sirmout) 
tpi|  = ipij  + 1 >>cdge.iiul\fedges,pij 
I >>edge.iiuUtedges.glink  = 0 
1 > >eilL'e.iiulx  Tedgcs.llij  = 0 
test  (cut  eq  1) 

ilso  1 >>e(.l'.;e.iiuhtedgcs.slink  = 0 
iliiol  I > >edec. iiulv  tedges.slink  = edges- 1 
edges  = edges  + 1 
Jedgeloop 

test  (eiit  eq  0) 

ifso  N>>nodc.indxTi, slink  = 0 
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ifnot  N>>node,indxti. slink  = edges  - 1 

N>>node.mdxti,cmiiU  = cnl  i 

N>  >n()dc.indxti,niftk  = 0 
N >>node.indxt  j. labelled  = false 
]iloop 

edges  = edges  - 1 

//  create  the  predecessor  linked  list 

for  i = 1 to  nodes  do 
[iloop 
let  ptr  = 0 

for  j = 1 to  edges  do 
[jloop 

if  H>>edge.indxtj.succ  ne  i then  loop 
test  (ptr  eq  0) 

ifso  (;>>edge.indxTj.plink  = 0 
ifnot  E>>edge.indxtj.plink  r ptr 

ptr  = j 
]jloop 

‘-N>>node.indxti.plink  = ptr 
]iloop 

I; 

//  initiali/.e  Function  list  by  linking  the  free  pointer  chain  | 

for  i = 1 to  fniax  do  j 

[ I 

F>>function.indxti.xylink  = i + 1 j 

] !. 

F>>fimction.indxtfmax.xylink  = 0 //  end  of  chain  j 

fnext  = 1 


]FiIllists 


and  Frinllisis  (strnioiit)  be 
[I’rmtlisls 

//  Print  the  contents  of  the  Node  list 

Wss  (strnioiit,"*cNode  l.ist:*c") 

Wss  (sirniout,"  Node  Count  PImk  Slink") 

Wss  (strniout,"  ft  Vi  (^i-%  Ffti]  I link  1 abellcd*c”) 
for  I = I to  nodes  do 
f (loop 

Wds  (strmout.i) 

V.iU  ( aimout,N>>nodi'.indxf  i.coiint) 

W'lls  (sirmoiit.N > >iu)de.iiul  \ r i.plmk) 

Wss  (strniout,"  ") 

W<ls  (sirniout, N>>no<lc.indx  ri. slink) 


(i4 
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Wcls  (strmoul.N  > >iu)de.nulx t i.ti) 

W ils  (sirnunit.N  >>n()de.indxf  i.yi) 

Wds  (strmmit.N >>nude.mdxt i.qi) 

Wds  (strmoul.N > >node.iridxt i.Eti) 

Wds  (sir  moui,N>>node.indxTi.flink) 
test  N > >node.indxT i. labelled 

ifsu  Wss  (siimoul,  " lrue*c'') 

ifnot  Wss  (sirrnout,  " false*c") 

]iloop 

//  Print  the  contents  of  the  Hdge  list 

Wss  (strnioiil,"*c*cHdye  Lisl;*c") 

Wss  (strmoul."  Index  Pred  Plink  Succ  ") 

Wss  ('-trmoul.'Slink  Load  Save  Pij-%  Glink  Bij*c") 
for  j = 1 to  edges  do 
□loop 

Wds  (strmoirt.j) 

Wds  ( sit  iiti'iil.h>  >edL’e.itulx  i i.pied) 

\\  iK  (sir  nn  'iit.l:  >>edge.ind  x i j.plink ) 

Wds  (sirmout,l:>  >edge.indx  t j.succ) 

Wds  (slrntoul,F>  >edge.indx  tj.slink) 

Wds  ( striinuit.P  > >edge.indx  t j.loaxi) 

W ds  (strmoul, P > >ed;:e  indx  t j.save) 

W ds  (sirmout.f  > >cdge  indxtj.pij) 

Wds  (sir  nuiiii.f'  > >edee.iiid  ' tj  giink) 

Wds  (sir  mmit,l- > >edge  iikIx  t i.liij) 

Wss  (slrmout,"*c") 

Ijloop 

]Pnntlists 


and  Evalfg  (pir,  x)  = valof 
[Kvalfg 

//  evaluate  the  f or  e function  (whose  fust  element  is  pointed 
//  to  by  pir)  wtth  argittnetU  x 

X = x+1 

let  result  = 0 

vv title  ptr  ne  0 do 

I 

tf  X le  I > >fiinclion.ttul\ t pit  X then  break 
result  : P > >t  itnclion.tndx  T ptt  .> 
ptr  = P > >1  imetton.indx  T |Mt.x>  link 
] 

resultis  resitlt 
JPvalfg 


and  Pnterxy  (x,  y,  Itnk)  he 
[Pnterxy 
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//  make  an  (x.y)  entry  on  the  Function  list 
let  i = fnext 

fnext  = F>>function.indxti.xylink 
F>>function.indxti.x  = x 
F>>function.indxti.y  = y 
F>>function.indxti.xylink  = link 
if  fnext  eq  0 then 
[ 

//  no  more  space 

Wss  (diskout,  "*cFunction  list  filled") 
Gels  (keys) 

] 

]Enterxy 


and  PrintF  (i,  j,  ptr,  strmout)  be 
[PrintF 

//  Print  either  Fi(r)  or  Gij(r) 

if  j eq  0 then 

[ 

Wss  (strmout,"*cFunction  - F") 

Wds  (strmout,  i) 

Wss  (sIrmouI,”(r)  ") 

] 

if  j ne  0 then 

[ 

Wss  (strmout,"*cFiinction  - G") 

Wds  (strmout, i);  Wds  (slrmoutj) 

Wss  (strmout,"(r)  ") 

] 

while  ptr  ne  0 do 

[ 

Wss  (strmout,  "*c  ") 

Wds  (sirmoiit,F>>f  unci  ion.  indxtptr.x) 
let  yval  = F>>funetion.indxTptr.y 
test  yval  eq  infinity 

ifso  Wss  (strmout,  " inf") 
ifnot  Wds  (strmout,  yval) 
ptr  = F>>fiinction,indxtptr.xylink 
] 

Wss  (strmout,  "*c") 

]PrintF 


and  Gclniim  (slrmin,  strmout)  = valof 
[Gi.tr.iim 

//  return  a bimiry  number  from  the  keybotird 


f)b 


oiitslr>>string.chart  j = $ 
] 

outstr>>string.lengih  = 7 
Wss  (strm,  outslr) 

]Wds 


and  Wrapiip  ()  be 
[Wrapup 

//  Close  disk  files,  etc. 

Closes  (diskin) 

Closes  (diskoiit) 

Ws  (”*cFnd  of  Program") 
f inisli 
] Wrapup 
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Appendix  B 

SAMPLE  INPUT  DATA  FOR  MERT  ANALYSIS  PROGRAM 


Enter  the  Maximun  Recovery  Time  M:  30 

Enter  the  Re-configuration  Time  R:  2 

Enter  the  number  of  nodes  in  graph;  7 

Enter  (tj  yj  qj(%)  for  node  1:  12  10  6 
For  each  successor  to  node  1 enter: 

(Successor  Node,  Load  time,  Save  time,  Pjj(%)  ) 

2 3 4 50 

3 3 4 25 

4 3 4 25 


Enter  (tj  y,-  qj(%)  for  node  2:  20  3 3 
For  each  successor  to  node  2 enter: 

(Successor  Node,  Load  time.  Save  time,  Pjj(%)  ) 
7 3 4 100 

Enter  (fj  yj  qj(%)  for  node  3:  5 2 8 

For  each  successor  to  node  3 enter: 

(Successor  Node,  Load  time.  Save  time,  Pjj(%)  ) 

4 3 4 10 

5 3 4 30 

6 3 4 60 


Enter  (tj  y|  qj(%)  for  node  4;  15  12  5 

For  each  successor  to  node  4 enter: 

(Successor  Node,  Load  time.  Save  time,  Pjj(%)  ) 
7 3 4 100 


J 
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Enter  (tj  yj  qj(%)  for  node  5:  6 4 1 

For  each  successor  to  node  5 enter. 

(Successor  Node,  Load  time.  Save  time,  Pjj(%)  ) 
7 3 4 100 


Enter  (tj  yj  Qj(%)  for  node 
For  each  successor  to  node 


6:  7 0 0 
6 enter: 


(Successor  Node,  Load  time.  Save  time,  Pjj(%)  ) 


7 3 4 100 


Enter  (tj  yj  Qj(%)  for  node  7:  10  5 2 
For  each  successor  to  node  7 enter: 

(Successor  Node,  Load  time.  Save  time,  Pjj(%)  ) 
<EOF> 


Appendix  C 

OUTPUT  DATA  FROM  MERT  ANALYSIS  PROGRAM 


(Note:  ffr)  and  gij(r)  are  step  functions.  They  are  stored  by  storing  their  breakpoint 
values,  i.e.,  fj(r):  30  oo  corresponds  to: 

f7(r)  = 0,  for  r < 30, 

00  for  I > 30.) 

Function  - f7(r) 

30  00 

Vertex  i is  unlabelled  and  vertex  j is  labelled  (ij)  =2  7 

Function  - 

0 10 

20  14 

30  00 

Bjj;  (i.  j.  Bjj)  2 7 20 

Function  - f2(r) 

0 10 

20  14 

30  00 

Vertex  i is  unlabelled  and  vertex  j is  labelled  (ij)  =4  7 

Function  - g47(r) 

0 10 

20  14 

30  00 

Bjj;  (i.  j.  By)  4 7 20 

Function  - f^lr) 

0 10 

20  14 

30  00 

Vertex  i is  unlabelled  and  vertex  j is  labelled  (ij)  =5  7 

Function  - It57(r) 

0 10 

20  14 

30  00 

By;  (i.  j.  By)  5 7 20 

Function  - 


2i 


31 


i 


Vertex  i is  unlabelled  and  vertex  j is  labelled  (ij)  = 6 
Function  - B67(r) 

0 10 

20  14 

30  00 

Rjj:  (i.  j.  Bjj)  6 7 20 

Function  - f^(r) 

0 10 

20  14 

30  00 

Vertex  i is  unlabelled  and  vertex  j is  labelled  (i,j)  = 3 
Function  - 

0 17 

13  21 

30  00 

Hjj:  (i.  J.  Bjj)  3 6 23 

Vertex  i is  unlabelled  and  vertex  j is  labelled  (ij)  = 3 

Function  - S35(r) 

0 16 

14  20 

30  00 

Bjj:  (i.  J,  Bjj)  3 5 24 

Vertex  i is  unlabelled  and  vertex  j is  labelled  (ij)  = 3 

Function  - U34(r) 

0 ' 26 
4 30 

30  00 

Bjj:  (i.  J.  Bjj)  3 4 14 

Function  - fi(r) 

0 17 

4 18 

13  20 

14  21 

30  00 

Vertex  i is  unlabelled  and  vertex  j is  labelled  (ij)  = 1 
Function  - b I4(r) 

0 26 

4 30 

30  00 

Bjj:  (i.  J.  Bjj)  1 4 14 

Vertex  i is  unlabelled  and  vertex  j is  labelled  (ij)  = I 

Function  - B|3(r) 
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